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PREFACE 


This book has been written primarily for prospective teachers 
who want to know how mental tests can be of help in their sct.vol 
work. It can serve also as a guide for teachers in service. The first 
seven chapters describe the yarieties of mental tests and point out 
the usefulness and {imitations of each sort. The last three chapters 
deal with the writing of objective items, with the construction 
of classroom tests, and with some of the ways in which mental 
tésts can usefully be employed in guidance and counselling. 
Chapter 2 covers in summary fashion the statistical terms and 
procedures most often used with mental tests. I do not believe ¿t 
possible to describe mental tests intelligently without using rele- 
vant statistical terms. At the same time, I think that the classroom 
teacher need not be a psychometrician or testing specialist in 
order to use standard tests in the school. For those who want to 
go further into test construction, there is an Appendix which 
treats statistical method more fully. 

I have found it generally better to teach Chapter 2 before 
taking up a discussion of mental tests themselves—to use it, that 
is, as a preliminary to later chapters. Chapter 2 can then be re- 
ferred to specifically when the various statistical terms occur. 
This procedure has the advantage of reviewing the basic statis- 
tics when the need arises. ' 

I believe that the book will be found to contain ample material 
for one term’s work. This is especially true when the laboratory _ 
exercises and questións at the ends of the chapters are covered in - 
class discussion, and when reports upon relevant literature are 
required.- ; E e 

e Henry E. Garrett 
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CHAPTER 1 


MENTAL TESTS IN THE SCHOOLS 


the Teacher and Mental Tests 


The widespread use of standard tests in today's schools ren- 
ders it increasingly necessary for the classroom teacher to be 
familiar with these devices, with what they are and what they do. 
‘Teachers are often required to administer and score tests and 
frequently to use these scores in the evaluation of pupil capa- 
bilities and future promise. This is «essential, of course, if the 
standard test is to have value in the work of the school. Most 
teachers, however, have no desire to become testing specialists 
or psychometricians, and many have little knowledge of modern 
statistical method. For these reasons, books dealing chicfly with 
the szatistics of test construction and with other technical prob- 
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lems, while а necessary part of the training of school and clinical 
psychologists, ‘are often not very useful to the teacher. In fact, 
they may leave him more confused than enlightened. 

This book is planned to present a comprehensive account of 
standard tests for teachers and for others not planning to become 
specialists in this field. It is not a book on statistical method, it 
does not deal broadly with the history of testing, nor with the 
applications of tests to problems of business and industry. Instead, 
it describes the various sorts of test, their uses and abuses, and how 
they supplement and aid the work of the classroom. Statistical 
terms necessary to an understanding of the tests themselves are 
defined and illustrated, but detailed calculations are not included 
in the text. The book's usefulness will be enhanced if the exer- 
cises and topics at the ends of the chapters are carefully worked 
through. It is highly desirable, too, that the instructor havé the 
class examine, take and score a number of tests. The discussion 
in a chapter will be clarified when there is actual familiarity with 
the tests described. 


What Mental Tests Are "Oe 


In a mental test, the examinee is confronted with a variety of 
tasks— questions to be answered, problems to be solved, direc- 
tions to be followed. Answers may be given orally, in writing, 
and sometimes by marking or manual manipulation, as, for 
example, by fitting blocks into apertures. Mental tests differ from 
physical tests, though there is considerable overlap in the two 
sorts of measurement. Both varieties of test require previous 
learning, and both present problems, but the mental test—to a 
greater degree than the physical—demands verbal abstraction 
rather than action, ideas rather than ‘muscles. Tests of physical 
fitness—of height, weight, and physical strength, for e:ample— 
differ most markedly from mental tests; in other words, are most 
physical, Tests which require speed and accuracy of hand-eye 
or hand-ear co-ordination, which demand manval dexterity and 
skill (called sensory-motor tests) are both mental azd physical. 
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But none of these tests is as “mental” as is the intelligence test 
or school examination-in algebra or history, since none of them 
depends to so large a degrec upon verbal symbols. 

The term mental test is sometimes restricted to the measure- 
ment of intelligence or aptitude, examinations in school subjects 
being classified as educational achievement tests. The reasoning 
here is that the mental test—the intelligence test, for example— 
tells us how much a child сал learn, whereas the school examina- 

. „tion tells us what he has already learned. 'To some extent this is 
true, But the distinction between the two sorts of measurement 
is one of degree rather than of kind. No mental test measures 
potential ability except by way of performance. We possess no 
microscope by which we can discover the inherited qualitics of 

© a child’s brain or nervous system. The general intelligence test, 

,„.to a greater degree than the school examination, measures poten- 
tial ability because it draws more upon native alertness than upon 
routine school learning. But the school examination also draws 
upon native alertness 4s expressed in school learning, and both 

“sorts of test demand the usc of symbols—words, diagrams, 
numbers, pictures. Accordingly, in this book the term mental test 
Will be used to describe both sorts of examination. 

The primary objective of a mental test is to detect individual 

‚ differences—that is, to discover how one child compares or 
“stacks up” against another child of the same age, sex or grade 
classification. This knowledge, as we shall see later, is useful in 
many ways in school and out. A second objective of the mental 
test is to discover intra-individual differences or the variations 
in performance within an individual. The scores made by an 

comparable units and represerited on a 


examince, when put in 
record of the c&aminee's strengths and 


profile, provide a useful 
weaknesses. 


A Classification of Mental Tests 


Та beginning the study of mental tests, it will be helpful to 
draw up a list of che different varieties of tests. Most widely used 
n 
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tests are standardized for procedure and results. A standardized 
educational achievement test, for example, is one that has been 
constructed in accordance with the best principles of test making 
and has becn administered to hundreds of pupils in those grades 
for which the test is suitable. Results from standard tests are 
expressed as norms. These are typical scores earned by large 
groups of children believed to be representative of various ages 
and grades. For example, a score of 45 on a standard reading test 
masy be the norm for children 9 years, 6 months old; or for éhil- 
dren who are just beginning the fourth grade. e 
The following outlinc gives some notion of the field to be | 
covered and at the same time furnishes an overview of the 
chapters to follow. 
VARIETIES OF MENTAL TEST 
I. Intelligence "Tests 
(1) individuil: administered to one cxaminec at a timc 
(2) group: administered, like a school examination, to many 
examinees at the same time 
(3) performance: make little or no use of language, in con- 
trast with the paper-and-penci] tests in (1) and (2) 
П. Educational Achievement Tests 
(1) survey: comprehensive examinations used to determine 
general academic standing 
(2) subject: examinations in specific fields—for example, 
physics, Spanish 
(3) diagnostic: cover a wide range of academic skills (in 
: reading or arithmetic, for example) and аге designed to 
reveal specific weaknesses and strengths 


Ili. Aptitude Tests 
(1) general: for example, of 
(a) mechanical ability 
(b) clerical ability 
(2) special: aptitude for school subjects—for exampl, chem- 
istry or forcign languages; differential aptitudes 
(3) professional: for example, in 
. (a) law 
(b) medicine E 
(c) engineering в E 
а) teaching 
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(4) talent: aptitude in such fields as - 
(a) art 
(b) music^ 
IV. Tests of Various Aspects of Personality 
(1) personal adjustment questionnaires: surveys of worries, 
fears, social inadequacies 
(2) attitude surveys: upon, for example, social, economic 
and political questions 
(3) inventories of interests as related to various occupations 
(4) environmental factors related to personality: question- 
naires covering socio-economic background and otksr 


variables 
(5) projective techniques: subtle and indirect measures of 
dominant personality trends 
2 


АП these mental tests will be treated in subsequent vn 
"The followi ing sections of this chapter provide a brief outline of 
**the development of psychological testing in order to clear the 
ground for later work. For a more complete discussion of the, 
historical development of mental tests, the student should consult 


_references at the end of this chapter. 


The Beginnings of Mental Tests 


Interest in psychological testing dev cloped in Germany and 
France about the middie of the last century. This interest grew 
out of the acute need for a better understanding of КЛА К у. 
ness and the various forms of insanity. Tests were devised for 
the purpose of determining what the feeble-minded person can 
learn, how much he can learn, and in what respects he differs 
most.drastically from the normal. In the case of the insane and 
the mentally deteriorated, brief tests were drawn up for assessing 
loss of memory, distortions of perceptione distractibility, mental 
fatigue, and changes in such sensory-motor functions^as speed 
ана. accuracy of motor responses. 

In E ngland, interest in mental testing arose from the study of 
individual differences in mental. and physical functions. The 
leader in this movement was. Sir. Francis Galton, an eminent 
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geneticist, у/по set up a testing laboratory in London in 1882. 
Here, for a small fee, a person could have the keenness of his 
vision and hearing tested, as well as his muscular strength and 
his speed and co-ordination of response. Galton’s tests were quite 


they were sensory-motor rather than strictly. mental in char- 
acter. One of the first American psychologists to become inter- 
ested in mental testing was James McKeen Cattell. Cattell in- 
troduced mental tests of the Galton type in this country at the 
turn of the century. 


Intelligence Tests: Individual 


-The individual intelligence test as we know it today grew out 
of the work of Alfred Binet, a French psychologist, wha’ was 
director of the laboratory for physiological psychology at the 
Sorbonne. In 1904 Binet was asked to devise a mental test suit- 
able for use in detecting slow learners in the schools of Paris. 
The test was to be used not only to’ sift out the subnormal chil- 
dren in the grades but also to provide a better understanding of 
degrees of feeblemindedness, with a view to improving the 
education of these children. In 1905, Binet, with a collaborator, 
Theophile Simon brought out the first scale for measuring intel- 
ligence. This scale consisted of thirty problems and questions 
arranged in order from easy to hard. A second edition of Binet’s 
Scale appeared in 1908, and a third and final edition in 1911. 
Thése tests differed sharply from those of Galton. Binet was 
interested in determining the intellectual level of school children, 
not (as was Galton); in studying differences among individuals 
in fairly narrow mental and sensory-motor functions. In order 
to measurc intelligence, Binet believed he must get tests which 
would measure a child’s memory, his comprehension and judg- 
ment, and his insight. He avoided questions which demanded 
specific and routine school learning. For example, instead of 
asking the examince the product of 6 x 3 or the name of the 
largest city in France, Binet asked the child to repeat four digits 


brief and sampled rather narrow aspects of behavior. In fact, 
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(single numbers) or the words of a sentence (heard only once); 
to tell the "thing to do" in specific problem situations; to 
criticize (“see through") an absurd statement or fallacy; to give 
differences between, for instance, a president and a king; to 
define abstract words like justice and loyalty. 

Binet’s famous tests became the basis for the widely used 
Stanford Revision of the Binet-Simon Scale, described in Chapter 
3. To Binet belongs the credit for having set up the first “age, 
scale"—that is, a test series in which items are arranged or 
grouped by age levels. A child's "score" on an age scale is deter- 
mined by the level attained and is expressed by a mertal age 
(MA), which denotes the child’s maturity. 

Children of preschool age are unable to do tests which require 
reading and word knowledge. For these children, therefore, as 
well as for children handicapped in speech, vision, or hearing, 
and for the non-English speaking, perfoxmance tests must be 
used. In a typical performance test, the child is asked to identify ^ 
common objects, string beads, build towers of blocks; or he may 
be asked to fit blocks into cutouts, arrange pictures in sequence, 
match the colors of cubes. Performance tests have been devised , . 
for use with illiterate and less intelligent adults as well as with 


children. 


Intelligence Tests: Group 


When intelligence tests are administered to large groups of 
examinees at the same time, they are appropriately called “group ` 
tests.” The first group tests were developed (in 1917) during 
World War I. Together with other information, these tests 
were used (1) in accepting or rejecting men, (2) in the classifica- 
tion of those accepted, (3) in the assignment of draftees to 
various* types of service, and (4) in determining adraission of 
candidates to officer training schools. There were two kinds of 
group test, called Army Alpha and Army Beta. The first was 
intended fôr scldiers who could read and write; it required that + 
an examinee follow ‘fairly involved directions, solve "mental 
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arithmetic” problems, know the meanings of words, and perceive 
relations (for example, in an analogies test the question might 
be as follows. Нала is to foot as give isto? ). Army Beta 
was a non-language or non-verbal test.. It made use only of 
diagrams, pictures, and numbers and was answered by a simple 
system of marking. Army Beta was administered to the illiterate 
and the foreign-born. Directions were given in pantomime for 
the benefit of those soldiers who did not understand English. 
| During World War II, 2 group intelligence test called the 
Army General Classification Test (AGCT) was administered to 
some 17,000,000 men. AGCT is a verbal or language test. It 
includes three sorts of materials: verbal (vocabulary), numerical 
(arithmetic problems), and spatial (for example, problems in 
spatial relations. presented by pictures of block piles to bc 
"counted" by the examinee). No specific "school" questions 
were asked since the test was designed to measure mental alert- 
ness in dealing with symbolic materials apart from specific train- 
ing. Both Alpha and AGCT arc still used in the testing of adults. 

Between World Wars I and II, scores of group tests of intel- 
ligence were constructed and used widely in the schools and ёо]- 
leges. These and other mental tests (aptitude, personality) have 

-been widely employed in business and industry as an aid in the 
selection and placement of personnel. 

In most group intelligence examinations, items are answered 
by marking onc of several possible solutions (multiple- -choice), 
by selecting one of two answers (truc-false), and by checking or 
underlining the appropriate reply among several options. These 
answer techniques are called “objective” (p. 185), because in 
scoring such tests the judgment of the examiner does not enter 
in—or does so to a very slight degree. Group tests of intelligence 
are treated in Chapter 4. 


Educational Achievement Tests 

Since World War I, a number of tests of educational achieve- 
ment have been construeted on objective principles. These tests 
are ured to deermirie general educational level or standing, as 
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ә 
well as knowledge of a given subject field—as, for example, 
geometry or French, The general survey test, when used in the 
clementary school, is a comprehensive examination of the stu- 
dent's knowledge of reading, spelling, arithmetic, grammar and 
literature, history and elementary science. Tes in separate 
subjects—history or physics, for example—are also available at 
educational levels from the secondary school to college\4duca- 


tional achievement tests are called diagnostic when they are used 


to reveal a student's weaknesses in a particular area such as 
arithmetic or reading. Diagnostic tests must of necessity cover a 
wide range of information and skills in a given subject. Educa- 
tional achievement tests are described in Chapter 5. ME 


Aptitude Tests 


"Tests designed to discover whether a student is Eas in. 
music ог enai, say, ог ү hether a young man has th 
knack for dealing with tools and mechanical contrivances are 
called aptitude tests. Aptitude may be inferred (1) from the 
degree of mastery attained in a “new” subject after a period of 
study. Aptitude for a foreign language, for instance, is demon- 
strated in the case with which the subject (Spanish, for example) 
is acquired after a term’s work. Achievement tests, given after a 
period of “exposure,” reveal this aptitude directly. Aptitude is 
also inferred before a period of study by testing (2) to see 
whether an examinee possesses those abilitics and skills judged 
to make for success in a given subject (for example, physics), 
orina profession (for example, medicine or law). Aptitude fo, 
physics is gauged by finding how well the student has learned 
the mathematics necessary for work.in physics; aptitude for law 
is judged by the student's ability to read difficult prose; compre- 
hend- fairly involved legal arguments and follow a line of 
reasoning to a conclusion. What are called "differential aptitude 
tests" are designed to assess a student's strengths and weaknesses 
in certain. fundamenral abilities believed to be crucial in a 
number of activities—in and out of school. ih 

Tests o£ general mechanical aptitude sample performance in a 


L] 


10 Mental Tests in the Schools 


number of activities believed to demonstrate mechanical knowl- 
edge and skill. Factors measured by these tests include familiarity 
with tools, insight into mechanical relations (pulleys, levers, and 
the like), ability to solve problems expressed in diagrams of 
machines and mechanical contrivances, and interest in mechan- 
ical things, as shown by the reading of popular science, building 
radios, tinkering with cars and so on. Manipulative tasks and 
` mechanical gadgets have been employed to test for special 
abilities in a variety of situations. Among the traits studied are 


manual dexterity, sensory-motor skills, visual and auditory: 


acuity, all of which are needed in many jobs in industry and in 
the armed forces. | 

Clerical aptitude tests cover the knowledge and skills needed 
in a business office. Tests under this head provide scores fzom 
which we can predict an examinee's ability to carry out the 
written work of an office—to spell, check records, read and 

^write easily and accurately. 

Aptitude tests of a special sort have been devised for inferring 
talent in art and music. Їп music, for instance, many of the 
factors needed for success can be measured: "ear" for music, 
rapid and accurate reading of music at sight, and knowledge of 
harmony and other technical phases of music. In art, "taste" for 
color, form, symmetry and other artistic dimensions are deter- 
mined by comparing a student's judgments with those of 
acknowledged experts. Knowing whether a person possesses 
telent in art or music is, often highly important in educational 
and vocational guidance. 

Aptitude tests are treated in Chapter 6. 


Personality Tests’ - 


Psychologists have used the questionnaire or inventory to de- 
termine personality factors in threc areas: (a) personal adjust- 
ment, (b) attitudes, and (c) interests. In addition, questionnaires 
have been used in the social sciences to survey socio-economic, 
home, and community phenomena. “Tests” of personality are 
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in reality standard interviews designed to reveal characteristic 
ways ‘of behaving. {The personal adjustment questionnaire or 
personal data sheet inquires into a person’s fears, worries, 
anxieties, and home and work adjustments. Such inventories are 
often appropriately called “trouble sheets.” In some cases, the 
questions are direct and undisguised: “Are you afraid of high 
places?” “Can you stand the sight of blood?” “Do your parents 
treat you right?” In other adjustment inventories, questions arc 
disguised and indirect, so that the intent of the question may not 
be understood by the examinee. A technique often used in such 
inventories is that of “forced choices" (p. 168). 

Attitude questionnaires attempt to reveal systematic ways of 
behaving or thinking about social, religious, or political matters. 
Can a student be classified as narrow- or broad-minded, religious 
or irreligious, or somewhere between these extremes? Attitude 
inventories try to answer these questions. 

Interest inventories survey a person's interests:in books, sports, 
people, occupations, social activities, and the like. An examinee's 
pattern of interests may serve to identify him with some well- 
defined occupational group—for example, lawyers or chemists. 
Or a young man's interests may identify him with some area of 
interests, such as science, business, or social service. Interest tests 
are especially valuable in counseling, since interest, as much as 
ability, may determine a student's educational or vocational 
choices. 1 
‚ Another group of personality tests makes use of what has been 
called “projective” techniques. Projective tests are disguised 
interviews in which an examinee is asked what he "sees" in some 
neutral situation—an ink blot or a picture. for example. These . 
tests are perhaps most useful in the diagnosis of disturbed mental 
states. They must be administered by an expert and are employed 
mostly by psychiatrists and clinical psychologists in severe be- 
havior problems. К 

The techniques of the personality questionnaire have been 
widely used in polls conducted to assess public opinion about 
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such things as political issues and social questions. Inventories 
have been employed, too, to survey systematically the association 
between items in a constellation of attitudes or opinions—for 
instance, between social and economic background factors. 
preferences for political candidates, etc., and views about social 
and economic issues. In sociological studies ‘in which environ- 
mental factors loom large, the kind of home from which a child 
comes, the educational and occupational status of the parents, 
and the character of the community may be revealed by a 


systematic survey of background variables. Personality tests are 
treated in Chapter 7. 


How Mental Tests Are Used in the Schools 


As we have said, the primary function of the mental test is to 
reveal individual differences. More specifically mental tests are 
useful to the teacher’ in three ways. First, mental tests aid in 
tlie evaluation of class performance in relation to established 
norms (p. 115). Second, tests reveal the strengths and weaknesses 
of individual pupils, that is, are useful in educational diagnosis 
(p. 116). Finally, tests enable the teacher to discover whether 
a pupil possesses aptitude for a given subject or course of study, 
and to predict his probable success in college or professional 


school. We shall consider these three objectives in the chapters 
to follow. 


SUGGESTIONS FOR FURTHER READING 


Comprehensive accounts of the 
the application of tests ir various 
below. 


development of mental testing and of 
areas will be found in the references 


Anastasi, A. Psychological Testing. New York: Macmillan, 1954. 


Freeman, F. 5. Theory and Practice of Psychological Testing. (Rev. 
ed.) New York: Holt, 1955. 

Ross, C. C., and Stanley, J. C. Measurement in Today’s Schools. (3rd 
ed.) New York: Prentice-Hall, 1954. 


Thorndike, R. L., and Hagen, Elizabeth. Measurement and Evaluation 
in Psychology and Education. New York: Wiley, 1955. 
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CHAPTER 2 


STATISTICS IN MENTAL TESTING 


The purpose of this chapter is to acquaint the prospective user 
of mental tests with those statistical terms and techniques most 
often used in testing. Stress throughout the chapter is on the 
meaning and significance of symbols and terms rathe than on 
the mechanics of computation. For the latter, the student should 
consult the Appendix as well as the books on statistical method 
listed at the end of this chapter. i 

Perhàps the best advice one can offer the teacher who is plan- 
ning to use mental tests is that he first take a course in statistics. 
For students who have been wise enough to do so, the present 
treatment will constitute simply a brief review and summary: ^ 
And for those who һауе had no statistical training, it will pro- 
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vide the minimum essentials for the understanding and evalua- 
tion of mental tests themselves. E 


THE FREQUENCY DISTRIBUTION 


Drawing Up a Frequency Distribution 


Suppose that a teacher has administered a test of English 
grammar to fifty children in the seventh grade. The papers have 
been marked and the names and scorcs of the children recorded. 
Two questions ordinarily arise: (1) What is the typical per- 
formance of the class, and (2) What is the range of talent in 
the class? To answer these questions, we may organize and pre- 
sent the fifty scores in one of several ways. 

Table 2-1 ‘is a systematic tabulation of the fifty English 
grammar scores into what is called a frequency distribution. 

The fifty scores have been arranged from high to low into 
sets of five under the heading “Scores.” In the frequency column 
headed “f” are listed the numbers of scores which fall into cach 
sub-group. For example, five children score in the interval 60-64. 
eight in the interval 55-59, and so on down to four who score 
in the bottom interval, 30-34, 

A test score is always taken to represent the distance along 


| TABLE 2-1 


Frequency Distribution of Fifty Scores 


| On a Test of English Grammar 
ا‎ 


Scores f 
60 — 64 5 
55-59 . 8 
50 — 54 10 
45 — 49 12 
40 — 44 6 
35 – 39 5 
30 - 34 4, 

М = 50 


T— 


Graphic Representation of the Frequency Distribution, 15 


some scale of ability running from low to high. Thus, a score 
of -46 covers the span from 45.5 to 46.5, 46.0 itself being the 
middle of the score interval. Other scores have the same meaning: 
in each case the score covers the distance .5 unit below to .5 
unit above the face value of the given score. This definition of a 
score means, of course, that the interval 30-34 begins at 29.5 
and ends at 34.5, that interval 35-39 begins at 34.5 and ends at 
39.5, and so on. For convenience in writing, the intervals in ^ 
Table 2-1 are the score limits rather than the exact limits. In each 

° case, however, the exact limits of the intervals are understood. 


Graphic Representation of the,Frequency Distribution 
A frequency distribution may be represented graphically by 
a frequency polygon, as shown in Figure 2-1. In the construction 
| 1 FIGURE 2-1 Frequency Polygon of Fifty "Scores Achieved by 
| Seventh-Grade Children on a Test of English Grammar 


15 


то HEH 
| £g 
-$ 
LF 
Ё 5 
H 
| 30 40 о «60 70 
509. 
(Scores) * 


are laid off along the baseline, or 
he frequencies (f’s) are plotted 


of a frequency polygon, scores 
X-axis, at equal intervals, and t| 
on the vertical or Y-axis. Fach f is plotted directly above the 
midpoint of the interval upon which it falls. The four scores _ 
falling in the first grouping, 30-34, are plotted above 32, rhe 


midpoint of the interval. In the other intervals (reading up), 5 
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scores are plotted above 37, midpoint of 35-39, 6 above 42, 
12 above 47, and so on. The points are joined with short straight 
lines to give the outline of the frequency polygon. 

A frequency polygon shows graphically how the scores are 
spread over the test scale from low to high. From Figure 2-1 it 
is apparent that more children scored in the middle of the scale 
(see, for example, the 12 on interval 45-49) than at either 
2xtreme. Rules for constructing a frequency polygon so as to 
provide a good picture of the test data will be found in the 
Appendix. ` 

Another way of representing a frequency distribution graph- 
ically is the bistogranr. Figure 2-2 represents the f's on the score 


FIGURE 2-2 Histogram of Fifty Scores Achieved by Seventh- 
Grade Pupils on a Test of English Grammar 
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intervals by small rectangles set up over each interval. For. the 
first interval, the rectangle is four Y-units high, and for the 
second interval five Y-units“high, and so on. The highest rec- 
tangle, 12 units on the Y-axis, is above interval 45-49. 
The histogram and frequency polygon represent the same 
facts, and there is little to choose between them. Frequency 
. polygons are to be preferred to histograms when two distribu- 
tions are plotted on the same axes, since in the histogram the 
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vertical and horizontal lines often coincide, making the figures 
dificult to disentangle. 


The Normal Curve 


The symmetrical bell-shaped graph shown in Figure 2-3 is the 
well-known normal curve. This “ideal” frequency polygon is 


e 


FIGURE 2-3 Tbe Norma! Curve 
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the mathematical model to which many distributions of actual 
e. (See, for example, Figure 2-1.) The normal 
led the normal probability curve because it 
e of scores of different size, 
rge number of independent 


scores approximat 
curve is often cal 
shows the probability of occurrence 
when these are determined by a la 


and randomly combined factors. 
ayed an important role in the develop- 


The normal curye has ple 
ment of mental measurement. Among its uses in testing may be 


mentioned the following: 


When the distribution of test 


1. Selecting. fhe Hems of a Test. 
“skewed,” as shown 10 


scores for a cfass is badly off-center or 
Figures 2-4 and 2-57 the test is not suitable for the group. In 
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Figure 2-4 the test is too easy—there are too many high scores; 
and in Figure 2-5 the test is too hard—there is a disproportionate 
number of low scores. When the test maker takes the normal 


FIGURE 2-4 Negatively Skewed Curve 
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Low High 


FIGURE 2-5 Positively Skewed Curve 


curve as his model, questions and problems are carefully selected 
and thei scoring adjusted to give a symmetrical arrangement of 
test scores like that of the normal curve. This means that a 
majority of pupils score at the middle of the scale, a smaller 


o 
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number scoring at the high and low ends. Note that, according 
to the criterion of normality, the frequency polygon іп Figure 
2-1 shows the English grammar test to be generally satisfactory, 
though perhaps a bit too easy. 


2. Scaling the Obtained Scores from a Test. Raw or obtained 
scores from a test are usually expressed by an arbitrary number 
of points. Scores of this sort do not represent equal steps or equal 
units along some ability scale; and since there is no zero point, 
a score of 40 is not twice as good as a score of 20. When point 
scores are transformed into deviations from the average or mean, 
and expressed in units of the standard deviation (page 36) of 
the group, they are called sigzza-scores. The unit of, deviation 
(the standard deviation) is usually represented by the Greek 
létter o (sigma). Sigma-scores may later be converted into 
standard scores (page 38). Many educational achievement and 
aptitude tests publish norms (page 40) in terms of standard 
scores. These scores are comparable from test to test when dis- 
tributions are normal, or approximately so. 

Point scores may be changed over directly into equal-unit 
scores in a normal distribution. Such "normalized" scores have 


several advantages (page 40). 


3. Determining the Stability of a Test Score. An obtained score 
on a test—for example, a group test IQ—can be expected to 
vary somewhat up or down when the test is administered a 
second time. The variation to be expected in a score, that is, its 

robable stability, can be predicted from tables of the normal 
probability curve (page 23). 
т 
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ready to compute a typica 
sorts of áveráges—also called meas 


common use. * 


ribution has been tabulated, we are 
] measure. or average. There are three 
ures of central tendency—ia 
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The Mean (MJ , 


Given a set of ten scores, 10, 9, 10, 12, 8,6, 4, 7, 5, and 4, the 
mean is simply 7.5—found by adding the scores (75) and dividing 
this sum by their number (10). The M is popularly called the 
average. When scores have been grouped into a frequency dis- 
tribution, as shown in Table 2-1 on page 14, a slightly different 
method is employed in finding the Af. (Sce Appendix, ) But the 
M is always essentially the sum of the scores divided by their 
number. 


The Median (Mdn) 


When ‘scores are arranged in order of size, another sort of 
average, the median (Mdn) i is the point in the distribution fouiid 
by counting off one-half of the scores from either end of the 
series. We usually start with the low end. For example, for thc 
five scores, 7, 8, 9, 10, and 12, the median or mid-score is 9: there 
are two scores above and two below it. When the number of 
scores is even—for example, 5, 7, 8, 9, 10, and 12—the median 
is midway between the two middlemost scores, namely, at 8.5. 
There is no mid-scorc. When scores are grouped into a frequency 
distribution, as shown in Table 2-1 on page 14, the median is 
still the 50 per cent point—the point found by counting 50 per 
cent of the way into the distribution. For a method of computing 
the median, see the Appendix. 


The Mode 


That score in a set, of scores which occurs most frequently is 
called the crude mode, or thé modal score. The crude mode is a 
third sort óf average. In Table 2-1 the crude mode is taken at 47, 
midpoint of the interval which contains the largest frequency. 
The mode can be computed more exactly, but usually we simply 

stake the most often recurring score as the crude ‘mode. without 
further refinement. In most cases, the mode is a preliminary 
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measure of central tendency. For exploratory pufposes it does 
not need to be computed so precisely as the mean or median. 
© 


MEASURES OF VARIABILITY 


The Range 


It is sometimes more important to know the variability of a 
set of scores than to know the mean or median. Suppose, for “` 
example, that two sections of Grade 7 have the same mean but 
differ markedly in spread of talent, as evidenced in the variability 
of scores around the mean. Figure 2-6 shows two distributions 


e 


.FIGURE 2-6 Two Distributions with the Same Mean but 


е Differing Markedly in Range (Variability ) 
IT 
A 
B 
20 40 M 60 80 


of this sort: the scores іп A'range from 40 to 60, whereas the 
scores in B range from 20 to 80. The difference betweensthe high 
and low scores in the A distribution is 20 points; in the B distribu- 
tion 60 points. The range is the most general index of variability. 
Other more’exact measures are the standard deviation (written 
as SD or o) and the quartile deviation (written as Q). i 
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The Standard Deviation (o) 

The mean of the set of five Scores—12, 10, 8, 6, and 4—15 8. 
If 8 is subtracted from each score, we have 12 — 8 = 4, 10 — 8 = 
2,8 —8—0,6 —8— —2,and 4 — 8 — —4, The size of a 
deviation from M tells the extent to which the individual score 
deviates from the common mean; and the sign of the deviation 
indicates its direction from M. If cach deviation is now squared, 
we have 4? = 16, 22 = 4, (— 2)? = 4, and (— 4)? = 16. The 
square of 0 is, of course, 0. The sum of these squared deviations 
is 40, and c, the standard deviation, is defined as 


E c =y (deviations)? 
ш М 
ог, in our éxample, c =\/40/5 =\/$ ү 
= 2.83* 


` Squaring the separate deviations around the M eliminates the 
minus signs and gives extra weight to extreme deviations. A SD 
or c is judged to be large or small (to reflect much or little varia- 
tion) in relation to other SD's computed for the same test. For 
example, if 35 boys and 42 girls have the same M on a history test 
but the boys’ o is 10 and the girls’ o is 6, we know that the boys 
Scores spread more than the girls’ up and down the scale—in 
both directions from the mean. 

In a normal curve, o provides valuable information concern- 
ing the way in which the separate measures fall around the 
common mean. In Figure 2-7, for example, 3c is seen to include 
virtually all the measures above the M, and —3o all of the meas- 
ures below the M. The total area of the normal curve is taken 
as N. From tables òf the area of the normal curve, we know that 
between M and 1c are approximately 34 per cent of the measures 
(actually 34.13 per cent); and between M and —10 are also 
34 per cent of the measures. The two “halves” of the curve are 
equal. Hence we find about 68 per cent of the measures—roughly 


= For calculation of т from a frequency distribution, see Appendix. 
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FIGURE 2-7 Areas Under the Normal, Curve 
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two thirds between М and +10. Furthermore, from tables we: 
find that 14 per cent of the measures fall between 10 and 2c in 
the normal curve and about 2 per cent between 2c and 3c. The 


' same proportions hold, of course, for the half of the curve to 


the left of the M, since the M divides the area of the normal 
curve into two equal parts. 

The relations of с to the total area (N) in the normal curve 
model hold pproximately for distributions which resemble the 
normal curve in form. An illustration will make clear how the 
normal curve model is used in such cases. Suppose that on a 
reading test administered to sixty children in the fifth grade, 
the M = 62 and o = 8; and suppose further that the frequency 
polygon of these scores closely resembles the normal curve in 
form. Taking the normal curve as our model, then, we can say 
that approximately two thirds of the scores (that is, forty) fall 
between 54 and 70 (62+ 8). Moreover, about 14 per cent of 
the scores, or-about 8, will fall between 70 and 78 (between 
1c and 2c), and about 2 pericent or 1 or 2 will fall between 
78. and 86 (that is, between 207 and 3c). In the lower half of - 
the distribution, 14 per cent or 8 scores, will fall between 54 
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and 46, and 2'per cent between 46 and 38. These relationships 
are shown in Figure 2-8. Mote that M, the reference point, 
is 62 and that c is 8 units on the test scale. 


FIGURE 2-8 Use of Normal Curve Model to Sbow Distribution 
of Sixty Scores on a Reading Test 


The Quartile Deviation (О) 


Just as we compute the median by counting off 50 per cent 
of the scores, so we can count off 25 per cent of the scores 
from the low end of the distribution (that is, 25 per cent of 
N) to locate Qu, the first quartile point. Similarly, we can count 
off 75 per cent of the scores from the low end of the distribution 
to locate.Q;, the third qüartile. The gap between Qs and Qı 
is called the interquartile range, or range of the middle 50 per 
cent. Q, the semi-interquartile range, is computed thus: 


_ Q3i-Q1 
9 2 
Like с, Q is a measure of variability but, unlike 9, it is found 


by counting into the distribution, whereas с is cemputed from 
the squared deviations taken around the M. When the median 


о 
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is the measure of central tendency, we generally u use Q; when 
M is the measure of central tendency, we use c. 

Methods of computing Q from a frequency distribution will 
be found in the Appendix; here we are concerned primarily 
with the meaning of О as a measure of variability. Q’s useful- 
ness will become clearer when we have computed the percentile 
curve, or ogive, as shown in the next section. 


PERCENTILES AND PERCENTILE RANK 


Table 2-2 shows the frequency distribution of Table-2-1, 
. with the addition of two columns i in which the,f's have been 
cumulated. 


TABLE 2-2 


Frequency Distribution and Cumulated Frequencies of 
Fifty Scores on an English Grammar Test 
Data are tbe fifty scores in Table 2-1. 


(1) G) 
f 


Scores 


60 – 64 
55—59 
50 - 54 


45 - 49 
40 – 44 
35 – 39 
30 – 34 


In column (3), scores have been added progressively—cumu- 
lated—from the bottom to the top of the distribution. On the 
first interval, 4 is the entry; 4 + 5 on the next interval gives 9; 
9 + 6 on the third interval gives 15; and so on. In column (4), 
these cumulared scores are expressed as percentages of N. In 
Figure 2-9,.cumulated f's, in percentages have been plotted 
against the score-intervals laid off along the baseline. As scores 
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FIGURE 2-9 Ogive or Cumulative Frequency Curve 


H 
50 60 70 
Score scale (x) 


are added over cach interval, each Yocum.f is plotted just above 
the wpper limit of the interval upon which it falls. The resulting 
S-shaped curve is called an ogive, or cumulative frequency 
graph. The ogive constricts or expands the scale of scores into 
а scale of one hundred points, called a percentile scale. 'The 
median and the О” can be read from the ogive almost as accu-- 
` rately as they can be computed from a frequency distribution. 
To illustrate, if a line is run from the 50 per cent point on the 
-scale across to the curve, a perpendicular dropped from this 
point to the score-scale locates Mdn at 49 approximately. (The 
computed value is-48.66.) The twenty-fifth percentile, or Q1, 
is located from the ogive at 42 approximately; and the seventy- 
fifth percentile, or Q3, at about 55. Other percentile points (for 
example, Pss or Роз) can be located in the same manner by going 
from the appropriate point on the vertical percentage scale 
-across to the ogive and dropping a perpendicular te the base- 
line. Note that the distance from Q3 to Q1 (that is,-55-42) is 
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the interquartile range or range of the “middle 50.” One-half 
of this distance is 13/2, or 6.5, which is the quartile deviation, 
or Q. The larger the Q of a distribution, the greater the spread 
of the middle 50 per cent of scores along the scale and the larger 
the variability. 

A pupil’s percentile rank (PR) is the position on the per- 
centile scale (on a scale of one hundred points) to which his. 
score entitles him. Suppose that Tom Brown achieves a score 
of 40 on our English grammar test. What is his PR? Going out 
to 40 on the score-scale on the baseline, up to the curve, and then 
across to the Y-scale, we locate Tom's PR at, about 20. This 
PR tells us at once that about 26 per cent of the pupils scored 
lower than Tom. If Mary Green scores 58 on the grammar test, 
her PR is read at approximately 84—and 84 per cent of the 
class made lower scores than she did. Scores achieved on tests 
expressed in different units—for example, a reading test and an 
arithmetic test—cannot be compared directly. But relative posi- 
tions (PR's) of a child in his classes can be quickly determined 
and compared when both sets of scores have been converted 
jnto a common percentile scale. Moreover, several PR's may be 


combined to give a general index. 


CORRELATION* 


The relationship between two sets of test scores can be de- 
scribed mathematicaliy by the coefficient of correlation between 
them. Correlation is expressed by a decimal fraction (called 7), 
which may vary along a scale from .00 to +1.00. Let us suppose 
that tests in English grammar and in history have been admin- 
istered to the same seventh-grade class. Suppose further that 
children who score high in the English test tend to score high 
in history, and that children scoring fairly high or quite low 
in English tend fo score fairly high or quite low in history. When 
this happens, the coefficient of correlation between the two 


i i i ient. 
* Sce Appendix for computation of a correlation coefficie 
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sets of scores will be marked or substantial, for example, .60 
to .70. Now suppose that most pupils who score high in English 
grammar score only average in arithmetic. The correlation be- 
tween these two areas would then be lower—perhaps no more 
than from .20 to .30. If those pupils who score high in English 
grammar tend to earn very low scores on a test in shop work, 
the correlation here would be close to zero, or perhaps negative. 

Positive coefficients of correlation run from .00 to +1,00; 
good scores in the one test go with good scores in the other. 
Negative coefficients of correlation run from .06 to —1.00— 
denote inverse relationship—and good scores in the first test go 
with poor scores in the second. Zero correlation denotes just 
no correlation between two Variables. 

Whether a correlation coefficient is to be regarded as high or 
low depends upon a number of factors. The correlation of 
height with weight,in school children is generally high—around 
-70 for a given age level. The correlation of a good intelligence 
test and school grades will fall typically between .50 and .70; 
and the correlation between personality traits (from question- 
naires) and school achievement are usually low and often nega- 


tive. The following table will aid in interpreting coefficients of 
correlation: 


E] 


т from .00 со + 20 very low; negligible 
7's from +.20 to + .40 low; present but slight 
r’s from +.40 to + 70 substantial or marked 
T's from 3.70 to £1.00 high to very high 


› 
When computing the correlation between two forms of the 
same test (the self-correlation of the test), we demand much 


higher 7’s than are found typically between different variables. 


The Relíability of a Test 


The Reliability. Coefficient. 
relation to mental testin 
of a test. Test reliabili 


One important application of cor- 
g 15 in the determination: of the reliability 
ty refers to tlie ‘stability of ‘test scores. 
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If a child achieves a score of 48, for example, on a highly reliable 
test of general science, subsequent scores earned by this pupil 
upon equivalent forms of this test should not differ greatly from 
the initial score of 48. But if the test is unreliable, in repeated 
testing the score may vary widely from its first determination. 

The reliability coefficient of a test is found by computing the 
self-correlation of the test. Suppose that a reading examination 
has been given to five sixth-grade classes and that two weeks: 
later the same test or an equivalent form is administered to the 
same classes. If the correlation between these two administra- 
tions of the test is high (a reliability coefficient of .90 or more 
is considered high), we may feel confident that scores earned 
by pupils in this class are reasonably accurate measures of “true” 
ability. ` 

Test reliability is sometimes determined by repeating a test 
and correlating the second set of scores against the first set. This 
method is followed when there is only one form of the test: 
More often, an equivalent or parallel form of the test is given, 
and the reliability coefficient is the 7 between the test and its 
alternate form. The reliability coefficients of many standard edu- 
cational tests have been determined in this way. Other ways of 
determining test reliability: will be found in the references at the 
end of the chapter. The authors of standard tests will usually 
specify what method has been used in computing the reliability 
of their tests. 


The Standard Error of a Score 

The accuracy or precision of an individual score is perhaps 
best expressed by the standard error of a score, which is also 
called the standard error of measurement. The SE (standard 
error) is calculated from the following formula: 


SE (score) =o V 1— fu 


d deviation of the test scores and fır is 


where o is the standar 
e that the с of a 


the reliability coefficient of the test. Suppos 
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set of test scores is 10 and the reliability coefficient (711) is .95. 
Then the SE of a scoze on this test is SE = 10 V1—.95 or 2.2. 
This may be interpreted to mean that should a child take this 
test a second time, the chances are good (about 7 in 10) that 
his “new” score ~ill not diverge by more than + 2 points from 
the true determiuation. The SE of a Stanford-Binet IQ is 4.5 
points for IQ's from 90 to 110. In other words, if the test is re- 

.Deated, we can expect a child's IQ to stay within 4 to 5 points 
of its true value. 

Reliability coefficients of standard intelligence and educa- 
tional achievement tests are generally above .90 for large groups 
of pupils. The size of the reliability coefficient depends upon 
several factors: the variability of the group, the length of the 
test, the metliod used in determining reliability. А reliability 
coefficient of .50 in a single grade or class may indicate as much 
stability of score as a reliability coefficient of .90 in a large 
group. The great advantage of the SE of a score is that it takes 
account of both the reliability coefficient and the variability (SD) 
in the group. (See page 56.) 


The Validity of a Test 


А. mental test is a valid testing device when it measures what 
it claims to measure. Tests are not valid for all areas and all 
Situations, but are valid in certain defined situations and for 
certain behaviors. A group intelligence test, for example, is not 
а Valid measure of emotional control or of delinquent behavior. 
Validity "пау be classified, for convenience, into three sorts: 
experimental, content, and predictive. 'The validity of an intelli- 
gence test is determined experimentally by computing the test's 
correlation with various criteria: school grades, ratings for mental 
alertness, and other measures of intellect, to mention’a few. 
Many of the best tests of general intelligence have been vali- 
dated against the Stanford-Binet, the best known: individual 
intelligence test (page 47). Aptitude tests—for txaniple, those 
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of clerical and mechanical aptitudes—are validated against dem- 
onstrated proficiency jn office work or in mechanical tasks. Inde- 
pendent measures against which tests are validated are called 
criteria. Criteria do not represent entirely adequate or com- 
pletely sufficient determinations of a trait. Insofar as criteria 
incorporate valuable aspects of the behavior we are studying, 
however, they represent variables with which a test, to be valid, 
must correlate positively. i 

Tests of educational achievement in history, mathematics, 
languages, and the like possess content validity in that test 
questions sample the subject matter areas directly. Content valid- 
ity is not alone a sufficient index of a test’s usefulness. Such 
considerations as choice of items, extent of sampling, form in 
which items are put, and level of difficulty are also very im- 
portant. But content validity is a necessary first step. Intelligence 
and aptitude tests possess content validity insofar as the items. 

` in them fulfill the author's definition of what he is measuring. 
Such asserted or "face" validity, however, is never as convincing 
as is the content validity of the educational achievement test. 
Generally, tests of intelligence and of aptitude must depend for 
their validity upon correlations with independent criteria judged 
to be dependable indices of the trait under study. 

Predictive validity is the degree to which a test battery is related 
to some criterion of future performance or measure of success 
which will become available in the future. The predictive validity 
of a good group intelligence test for school performance, ranges 
from about .40 to .60. (See page 96.) Many short tests have low 
correlations with a criterion, but when put with other tests 
into a team combine forces to raise the correlation of the battery 
with the criterion. Validity coefficients do not run as high as do 
reliability’ coefficients, since по test can correlate higher with 
other tests than with measures of itself. 

Personality questionnaires, interest blanks, and attitude scales 
have content validity insofar as choice of items is concerned. 
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Such instruments. are usually validated experimentally against 
objective expressions of interest, indices of neurotic behavior, 


and the like. 


Practical Considerations in the Choice of a Test 


There are a number of factors which enter into the choice 


of a mental test besides validity and reliability. Some of the 
more important are the following: 


1. Appearance: Is che test format good—are the items attrac- 
tjvely presented and arranged? 

2. Administration: How much time is required to give and 
score the test? What is the cost? 

3. Manual: Does the author give full accounts of reliability and 
validity—how found, upon what samples, of what sorts? 
Are instructions clear? 

4. Norms: Are the test norms readily interpreted? Are age and 
grade equivalents given? What type of scaling is used? 


SCALING TEST SCORES 


The purpose of scaling is (1) to revamp the raw test scores 
into a scale of equal units, and (2) to enable us to combine 
Sub-tests into a single index. It is sometimes important (especially 
with aptitude tests) to compare relative performances, and this 
can be done only when tests are expressed in equal units. The 
Score on a test when expressed simply in number of items done 
correctly is an agg 
ranked in order of merit for such aggregates, but such "scores" 


do not constitute a scale. There are several methods for scaling 
raw scores, 


The Age Scale Р 


t ge) units, they form an 
age scale. Mental age is the chronological age whicli corresponds 


to or is typical of a given test score. The MA of 9-6, for example, 


regate of arbitrary points. Pupils can Бе: 


E 


e 
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represents the performance of the average child who is 9 years 
and 6 months old. Thus if Dick achieves an MA of 9-6 on thc 
Stanford-Binet, this mental age is a measure of his intellectual 
status or degree of mental growth. 

If Dick's life age (CA) is 10-4, his IQ is 92. IQ — MA/CA 
and in our cxample is 114 mos/124 mos; the decimal is dropped. 
IQ is a measure of a child's brightness relative to that of other 
children of his age. When MA and CA are equal, the IQ is 100, 
the brightness index of the average child. Dick's IQ of 92 means 
that he is somewhat less bright than the typical child of his age 
level. IQ's above 100 are achieved by bright children—those 
whose mental growth runs ahead of their years.. IQ's below 100 
indicate that a child is below normal, and very low IQ's (70 or 
below) imply fecblemindedness. ? 

The age-scale is used in most individual intelligence scales and 
by many group tests of general intelligence. The MA and IQ 
were first widely used to measure performance on the Stanford=, 
Binet test, which was constructed so as to meet the requirements 
necessary to yield a constant ratio score, or 1Q. Many group 
tests do not meet these requirements. It is wise, therefore, to 
accept IQ's from group tests as tentative indices of brightness 
not always closely related to IQ from the Stanford-Binet. 


The Percentile Scale 

We have already seen (page 25) how obtained scores can be. 
fitted into a scale of one hundred units to yield a percentile 
scale. The PR (percentile rank) of a score—its position on the 
percentile scale—can be computed from the frequency distribu- 
tion of scores. But the simplest plan is to plot an ogive (sce Figure 
2-9, page 26) and read the PR from the graph. The PR of 
any score then becomes the percentage of the distribution which 
lies below the score. This method is not accurate beyond the 
first decimal, but it is sufficiently precise for many purposes. It 
is easy to apply and requires a minimum of calculation. T os 
gives the frequency distribution of 180 scores on a clerica 
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aptitude test earned by students enrolled in sev 


eral courses in a 
business college, 


TABLE 2-3 


Frequency Distribution of 180 Sceres Achieved on a 
Clerical Aptitude Test 


————. 


PR's of 
Scores Midpoints 


n . midpoints 
194 — 196 195 

191 — 193 192 

188 — 190 

185 — 187 

182 — 184 


179 — 181 
176 ~ 176 
173 - 175 
170 - 172 


The ogive in Figure 2-10 has 

76 cum. f's in Table 2-3 following t 
26. In the last column are entered the 
Successive score-intervals, The midpolnts 
192, 189, 186, 183, 180, 177, 174, 
carns a score of 19], 192, or 193— who falls 


next to the top—receives a PR of 9 
this interval, These midpoint PR’s 


‘spelling. If his PR's in 


be represented Comparatively on 
2-01. * i 


This graph permits a Comparison of the child's achievement in 
the five subjects, It is clear that he is satisfactory їп arithmetic 
(PR. = 60) and Science (PR — 55), abóve average їп history 
(PR — 60), average in English (PR = 50), and below aver- 


n, they can 
a profile as shown їп Figure 


o 


The‘Percentile Scale 35 


ө 


FIGURE 2-10 Cumulative Frequency Polygon (Ogive) of 180 


Scores Acbieved on a Clerical Aptitude Test 


o 


170.5 175.5 180.5 185.5 190.5 195.5 


Score’ scale (x) 


Profile of the Percentile Ranks in Various Subjects 


FIGURE 2-11 
for a Given Child 


$r 
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age in spelling. (PR — 45). Comparisons of this sort cannot be 
made from raw scores. One disadvantage of the PR scale is the 
fact that units are not equal at the extremes of the scale. When 
PR's are below 20 or above 80, they must be compared (or 
combined) with caution (see refs.). 


Sigma-scores and Standard Scores 


We have seen that one way of converting raw scores into 
а scale is by means of percentile ranks. Another method of 
scaling is to express the deviation of cach test score from the 
common mean in units of SD, thus putting all scores into c-units. 
Such "deviation" scores are called o-scores and sometimes z- 
scores. The following is an illustration of the method of con- 
verting obtained scores into o-scores, e 

Table 2-4 gives the М” and o's earned by fifty sixth grade 
pupils on five objective educational achievement tests. At the 


bottom of the table are listed the scores achieved by two children, 
Mary and Howard. 


TABLE 2-4 


M's and o's Earned on Five Objective Tests of Educational 
Achievement Given in thé Sixth Grade 


(1) Arith, (2) Arith, 


Reas. Comp. (3) Reading (4) Grammar (5) Science 
Mean 62 124 43 28 46 
с 3 10 20 7 4 8 
Mary's scores 57 119 50 31 36 
Howard’s scores 62 


144 41 26 49 


From an inspection of these scores, 
below thé class mean in arithmetical reas 
putation, and science, but is above the m 
mar. Howard, on the other hand, is exa 
metical reasoning, 
and slightly below 


it is clear that Mary is 
oning, arithmetical com- 
ean in reading and gram- 
ctly on the mean in arith- 
above the mean in computation and science, 
the mean in reading апі grammar. These com- 


——— 


0 
Sigma-scores and Standard Scores 37 


parisons are useful, but because of differences in the units in which 
test scores are expressed, we cannot (1) compare Mary's and 
Howard's scores in the several tests, except to point out that 
they are above or below the mean, nor (2) combine either pupil's 
scores into a single meaningful index of academic achievement. 
Conversion of test scores into c-units will permit us to carry 
out both these operations. 
The formula for a с-ѕсоге or z-score is 


z= XZM) йе 
с 


s= = 
с 


where (X — М) = х. Mary earned a score of 57 iñ arithmetical 
reasoning. This score deviates —5 points from the mean 
(57 — 62 = —5). If we divide this deviation of —5 by 10 (the 
с), we have —.50 as Mary's o-score in arithmetical reasoning. 
‘In Test 2, arithmetical computation, Mary's 7-score is (119 — 
124)/20 or —.25. Her other o-scores are computed in the same 
way; those that are plus are above the mean, those minus below 
the mean. Mary's five с-ѕсогеѕ are shown below: 


Test: (1) (2) (3) (4) (5) 
Mary'sc-scores —.50 EAM 1.00 75 —15 


Howard’s o-scores аге found as were Mary’s. In Test 1, his 
score of 62 is exactly on the mean, and his -score is .00. In 
Test 2, arithmetical computation, Howard's o-score is (144 — 
124)/20 or 1.00. Howard's scores are below the mean in tests 
3 and 4, and his o-scores are minus. His scores are tabulated 
below: | 

Test (1) (2) (3) (4) (5) 
Howard’s o-scores 00 “1.00 .—.29 ° —.50 38 


It is apparent that o-scores are simply plus or minus deviations 
from the test mean expressed іп © units. A practical disadvantage 
of such scores is the fact that they are small decimal fractions 
and are about'as often + as — For greater convenience, there- 
fore, o-scores аге usuaily converted into a distribution of 
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standard scores with an assigned М апа c. M's and o’s often 
selected are M = 100, с = 20, M = 500, o = 100, M = 10. 
= 3. 

If Mary's and Howard's scores are converted into a standard 
score distribution with Af = 100 and o = 20, we have the 
following: 


Tests (1) (2) (3) (4) · (5у Total Mean 
Mary's standard А 


1] 


scores 90 95 120 115 75 495 99 
Howard's standard 
Scores 100 120 94. 90 108 512 102 


In the first test, Mary's o-score is —.50, or —.50 of o below the 
М. In our new distribution (M = 100, с = 20), the equivalent 
standard score is one-half of o below 100, or 90. In Test 2, 
Mary's standard score is М of o below the mean of 100 or 95 
. (M of 20 15 5). A formula for converting obtained scores directly 


‘into standard scores with a М = 100 and a o = 20 is the 
following: 


X' = 22 (x — M) +100 


in which X' — standard score in the “aew” distribution 
X — original or raw score 
M — mean of the raw Score distribution 
100 and 20 are the M and o of the new distribution 
© =SD ofthe original or raw scores 


AE ‘ 
Substituting for Mary’s raw score of 50 in Test 3, we have 


X’ = 20/7 (50 — 43) + 100 
= 120. 

Howard's standard Scores i 

from the same formul 

obtained Score is 26 an 


n the new distribution are found 
a. In Test 4, for example, Howard's 
d from the formula we have 
Re 20/4 (26 — 28) + 100 " 

= —10 + 100 or 90 - 
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The formula will convert any pupil’s raw scores into standard 
Scores when the M of the standard score distribution is 100 and 
c is 20. е 

When put in standard score form, Mary's and Howard's scores 
can be compared directly; and the five scores of each child can be 
combined with equal weights. On the five tests, Mary's average 
15 99 and Howard's 102. 


The IQ as a Standard Score 

When raw scores are converted into standard scores in a dis- 
tribution with a mean of 100 and a с of 15, these new scores 
are often called "deviation IQ's? (page 36). In the Wechsler- 
Bellevue Intelligence Test, for example, IQ's are determined by 
thig method. A. Wechsler-Bellevue IQ of 115 is 1с above the 
mean of the group; an IO of 85 is —1c below the mean of 
the group. f 

A general formula for transforming obtained scores into 
standard scores with any given mean and c is 


x-Zx-aM 


where 1 
X' — standard score in new distribution 


X = obtained score (usually in points) 
о” = SD of the new distribution 

o = SD of obtained score distribution 
M' — M of standard score distribution 
M = M of raw score distribution 


This formula may be used to compute deviation IQ's. Suppose 
that Arthur J., a veteran 32 years old, earns a score of 86 on an ^ 
intelligence test for which the mean of his age-group is 80 and 
the o is 10, What is Arthur J.'s deviation IQ? Substituting in the 
formula, we Have 

° X= 15/10 (86 — 80) + 100 
р = 109 
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The formula is useful when we wish to convert the sub-tests of 


a battery into comparable unns which may be combined into a 
single score. ‹ 


Normalized Standard Scores, or T-Scores 


When raw point scores are transformed into PR’s and the 
resulting PR's are converted into equivalent "scores" in a normal 
distribution, the final Scores аге said to be "normalized." If the 

` normal scaling distribution into which the 
has an M = 100 and o = 10, the normali 
"T-scores. Converting raw scores into T- 
done with the aid of tables prepared for t 


Scores are converted 
zed scores are called 
Scores can be easily 


his purpose. First, the 
PR's of the scores (or of the midpoints of the successive inter- 
F 


vals) are геа тот an ogive. The T-scores (normalized scores) 
; Corresponding to these PR's are then read from tables. 'T-scores 
range theoretically from 0 to 100, practically from about 15 to 
“85. The method of computing T-scores for a given distribution 
Will be found in detail in the references. 

For several reasons, T-scaling is theoretically the soundest 
method of converting raw scores into an equal-unit scale. Many 
of the widely used educational achievement tests make use of 
some variety of T-scaling. T-scores can be added or averaged; 


they have the same meaning and denote the same relative 
achievement, 


NORMS 


ypical or characteristic of pupils 
of a given age or grade. To provide comparable norms, scores 


IQ terms. Such MA's and IQ's are rarely comparable to the 
MA's and IQ's of the Stanford-Binet. 


E 
Educational achievement tests usually provide both. age and 
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grade norms. From a table of norms, a teacher can tell whether 
her class is up to grade level, and she can tell how individual 
pupils in her class stand relative to each other on the sub-tests 
of the battery. Suppose that Carl W., age 11-2, and just entering 
the sixth grade, earns a score of 18 on an arithmetical problems 
test of the Metropolitan Achievement Test. From the table of 
norms we find that Carl has a PR of 68 on the test. Further- 
more, we find that his age-equivalent is 12-4 and his grade equiv- 
alent is 6.9. Carl’s score is typical of children about a year older 
than he, and his knowledge of arithmetic equals that of children 
who are completing the sixth grade. His PR, of course, reflects 
performance above the average. ә 

Тһе SRA (Science Research Associates) verbal and non- 
verbal tests are group tests of intelligence. Norms are given in 
PR's and IQ's. If a child achieves a score of 34 on the verbal 
section, for example, his PR from the table of norms is 40 and 
his IQ (really a standard score) is given as 96. The Stanford 
Achievement Test provides age and grade equivalents to obtained 
scores. Raw scores from nine sub-tests are converted into an 
equal-unit scale, in accordance with which a profile is drawn up 
(page 35). Suppose that Louise M., age 12 years and 6 months 
and in the last quarter of the seventh grade, earns a raw score of 
40 on the science test of the battery. From tables of norms we 
fnd that this score has a grade equivalent of 8.3 and an age 
equivalent of 13-4. Thus Louise's score in science places her 
above her age and grade levels. Her PR on the science test is 60. 

Many aptitude tests supply scaled score norms fer various 
groups of workers differing in experience, training, and skill. 
Interest inventories are scored so as to reflect an applicant's in- 
terests in a large number of occupations. Thus if the vocational- 
interest blank is scored with the key for lawyer interests, we can 
tell whether the applicant has the interests of a lawyer and to 
what extent. Scores from personality questionnaires serve to iden- 
tify a subject as “dominant,” “introverted,” or “neurotic” іг. 
relation to the norms given for these classifications. 
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Teachers use test norms for a number of purposes, which will 


be elaborated upon in later chapters. Among the more important 
objectives, we may list the following. 


1. To estimate group achievement. Performance of the class as 
a whole can be evaluated against national, state, or local 
norms (page 11s), 

2. To evaluate individual achievement. A pupil’s score on an 
educational achievement test is always considered in connec- 
tion with his native capacity or mental alertness. A slow or 
dull child may be working up to his limit, whereas a bright 
child may be performing below expectation. 

3. То evaluate family and cultural background. The achieve- 
ment of a class or of an individual will always depend on his 
socio-economic status, family background, and opportunities. 

4. To evaluate the curriculum effects, A pupil’s achievement 
must be judged as good or poor in the light of the content, 

. emphasis, and objectives of the school. 

5. To measure individual differences. There are always wide 
differences in academic achievement within a group or class. 
These differences are. due in part to differences in native 


ability and in part to differences in environmental oppor- 
tunities, е 
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QUESTIONS FOR DISCUSSION ` 


Я 
1. A fifty-item multiple-choice test in science, 


administered to ninety 
pupils, showed scores ranging from 16 


to 38. Fifty scores fell" between 


° 
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35 and 48. What would the distribution be like? Would it be skewed? 
What measure of central tendency would be most suitable? 
2. In question 1, what would you conclude about the suitability. of the 
test for the group? 
3. Explain the implications of cach of the following correlation 
coefficients: 
(a) The correlation between height and score on an arithmetic test 
is .04. 

(b) Ratings of pupils for social adjustment and aggressiveness show a 
correlation of —.65. А 

(c) The correlation between term grades and scores оп a group 
intelligence test is .70. 

4. Rank the following 23 scores in order of size: 35; 40, 31, 29, 35, 23, 
32, 34, 28, 34, 15, 14, 34, 40, 22, 32, 30) 39, 50, 19, 40, 27, 37. Compare the 
“mid-score” with the mean. 5 

% Karl's PR on a biology test is 48. What does this mean? 

6. Margaret has taken five tests. What would be the advantage of 
expressing her scores on these tests as PR's? 

7. Given the following: 


Parargaph Reading Atithmetic 
Mean 81.7 385 
[4 9.2 6.5 


William achieves a score of 56 on the first test and 35 on the second. 
Convert these raw scores into z-scores. 

8. How are age and grade norms obtained? Which is the morc useful 
in determining placement? 

9. Two classes earn about the same mean on a test, but Class A's SD is 
twice the size of Class B's. What do you conclude from this fact? 

10. How would you validate a teacher-made test? 


CHAPTER 3 


INDIVIDUAL INTELLIGENCE SCALES 


This chapter will consider four individual intelligence scales or 
test batterics.* These are (1) the Stanford-Binet** (1937 or re- 
vised forrin) designed for children from age 2 throu 
(2) the Wechsler Bellevue Intelligence Scale, 
with adults; (3) the, Wechsler-Intelligence S 

` (WISO); and (4) the Arthur Performance Scale, 


, 


~ > ° A test battery is a group of carefully selected tests designed to operate as a 
team 


°° The full name is Stanford Revision of the Binct-Simon Scale. 
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widely used, valid and dependable. Ordinarily, the individual 
intelligence test will not be administered by the classroom 
teacher. But the teacher must be familiar with the make-up of 
these scales and with their role in the school program if he is to 
make good use of the test findings. 

The individual intelligence examination should not be admin- 
istered by a novice. To give such a test—and more important to 
interpret it—requires special training in mental measurement 
and in clinical psychology, plus a sound knowledge of psycho- 
logical theory. In addition, at least six months should be spent in 
giving and scoring these tests under supervision, if one is to have 
a minimum of "clinical experience." Unfortunately, perhaps, 
directions and materials for giving the individual scales are 
readily available in the manuals, and the beginner is tempted to 
try his hand at administering the tests. Much undeserved criticism 
of the individual intelligence test—and of the MA and IQ—hzs 
arisen from the faulty administration and interpretation of these 
scales by the unskilled amateur. 


The Concept of General Intelligence 


Before examining the individual intelligence scales in detail, 
we must get a clearer notion of what the tests are attempting to 
measure. This means that we must formulate a definition of what 
is meant by “general intelligence.” 

Definitions of general intelligence have run the gamut from 
such comprehensive biological descriptions as adjustment to the 
environment to the fairly narrow designation of aptitude for 
academic work. The French psychologist Alfred Binet defined 
intelligence as (1) the ability to take and maintain a definite 
directioh—that is, to carry through a course of action once 
begun; (2) adaptability to new situations and new requirements; 
and (3) the power to evaluate and criticize one’s own acts (not 
present in the feebleminded). Other psychologists agreeing in 
the main with Binet have stressed adjustment to life and capacity 
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to learn. In contrast with these broad formulations, Lewis M. 
Terman, author of the Stanford-Binet, has defined intelligence 
simply as the ability to carry on abstract thinking. 

Definitions of general intelligence must of necessity be broad 
when they stress biological adaptability to life. Such definitions 
are hardly incorrect, but neither are they useful. Indeed, any 
attempt to encompass such a comprehensive function as general 
adaptability, is a well-nigh impossible task. On the other hand, a 
definition of intelligence simply as the ability to do school work 
is certainly too narrow; we should include proficiency in every- 
day activities in business: and the professions, where aptitude 
displayed in school finds ready application. р 

In order to give greater precision to the concept of intelligence, 
the educational psychologist Edward L. Thorndike has suggected 
that we recognize at least three broad areas of intelligent be- 
havior. These “intelligences” he called abstract, mechanical, and 
‘social. Abstract intelligence he defined as the “ability to under- 
stand and manage ideas and symbols, such as words, numbers, 
chemical or physical formulas, legal decisions, scientific prin- 
ciples and the like. . . ." In the case of students, this is very close 
to what is called scholastic aptitude. Mechanical intelligence in- 
cludes “the ability to learn, to understand and manage things and 
mechanisms, such as a knife, a gun, a mowing machine, an auto- 
mobile, a bóat, a lathe. . . 2 Social intelligence is “the ability to 
understand and manage men and women, boys and girls, to act 
wisely in human relations.” We should expect to find high ab- 

igence in scholars, scientists, executives in business 
and government; high mechanical intelligence in mechanics, 
builders, expert Carpenters and plembers; and high social in- 
telligence in politicians, salespeople, leaders in socie 
ably the successful civil епо; 


mechanical and social intelligence. "These “intelligences” are 
positively, but not always highly, correlated. Hence, a high level 


о 
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of one "intelligence" may accompany a fairly low degree of 
another. A nuclear physicist (high in abstract intelligence) may 
be socially inept. And the man successful in business or politics 
may be mediocre in mechanical skills. Perhaps the able jack-of- 
all trades can be expected to rate well, but not necessarily very 
high, in all three areas. / 

On examining the individual intelligence test, we find that it 
presents a variety of problems which demand the ability to utilize 
ideas and symbols, for example, words, numbers, diagrams, 
pictures, geometrical figures. When used with young children, 
general intelligence tests are primarily measures of mental alert- 
ness on the abstract level. For adults, these tests are measures of 
the aptitude for such occupational and other tasks as- draw upon 
abilities operative in school work. In short, the individual intel- 
ligence test measures abstract or scholastic ability primarily and 
is rarely a gauge of mechanical aptitude or of social competence. 
The evidence for this view comes from an analysis of the tests 
themselves, as well as from many studies in which individual 
intelligence tests have been used. 


THE STANFORD-BINET INTELLIGENCE 
SCALE (1937 REVISION) 


Because of the time required to administer the Stanford-Binet 
(in most cases forty minutes to an hour) and the training de- 
manded of the examiner, this test is rarely given routinely in 
most schools. The classroom teacher must be generally familiar 
with the Stanford-Binet, however, in order to know what can 
be expected of it—that is, how it might add to her knowledge 
of a given pupil. The Stanford-Binet is a valuable supplement to 
a group iatelligence test or to an educational achievement ex- 
amination when (1) a child has a severe reading disability or 
some physical handicap (for example, in sight, hearing, or mus- 
cular co-ordination); (2) when a pupil exhibits marked emotional 
stress or emotional disturbance; and (3) when other test results 
or school marks do not jibe with the teacher’s estimate of the 


E 
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pupil’s ability. For purposes of routine classification and place- 
ment, the group intelligence test is about as satisfactory as the 
Stanford-Binet, but the latter will provide а more accurate, de- 
tailed, and comprehensive appraisal of intellectual level and is 
more useful in diagnosis and prediction. 


Description. The 1937 edition of the Stanford-Binet represents 
а careful and thorough re-working of the earlier 1916 scale. The 
number of test items was increased from 90 to 129 and the scale 


TABLE 3-1 illustrative Tests from Stanford-Binet Scale 
Year IV © 5 

1. Picture Vocabulary Child must recognize and name everyday 
е objects seen in the pictures. à 
2. Naming Objects Child is shown small toys representing com- 
from Memory mon objects. These he names, or they are 
named for him. Later he must recal! from 

memory the name of each object. 
3. Picture Completion Child must finish the incompleted drawing 


of a man. 
4. Pictorial Identifi- Pictures of objects on cards to be identified. 
cation 
5. Discrimination of Recognition and identification of simple 
Forms geometrical forms. 
6. Comprehension Sensible answers to “why” questions. 
Alternate: Memory Repetition of short sentences read aloud to 
for Sentences the child. 
Year X 
1. Vocabulary The examince must give definitions of 


eleven words in a standard vocabulary list. 
2. Picture Absurditiee Must recognize what is "foolish" in a рге- 
It sented picture. 
3. Reading and "Reads a selection and reports from memory 
Report what is read. » 


4. Finding Reasons Gives sensible reasons to explain cause-and- 


effect relations in familiar situations. 
5. Word Naming Names as many words as he can in one 
t minute: a measure of word fluency. 
ó. Repeating Six The lists are read alóud at the rate of about 
Digits one a second. 
сс چ و ي‎ 
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extended down to lower age levels and much strengthened at 
upper age levels. Two equivalent forms of the scale, called L and 
M, were constructed. Table 3-1 contains a selection of the items 
at different age levels. Note that at the lower age levels, such as 
IV, the test situations make use of objects and pictures and require 
that the child understand and carry out oral directions. At the 
upper age levels, X, XIV, and Average Adult, the test items are 
more abstract and bookish; the problems require verbal and 
numerical manipulation, reasoning, logical selection and choice 


for Years IV, X, XIV and Average Adult 


Year XIV б 

1. Vocabulary Larger vocabulary required than at year X. 

2. Induction Tests ability to grasp and apply a general 
° rule. 

3. Picture Absurdities Must recognize what is "foolish" in a pic- 
ш ture; more difficult than at Year X. 

4. Ingenuity Tests ability to solve problems mentally. 

5. Orientation: Direc- Must be able to solve problems involving 
tion I space relations by following fairly com- 


plex directions. 
6. Abstract Words II Must define words like “loyalty” and 


“justice.” 
Average Adult ° 
1. Уосаһшагу Larger vocabulary than at Year XIV. 
2. Codes Must learn two codes and write messages 
in them. 
3. Differences Tests ability to generalize; makes use of 
Between fairly difficult concepts. 
Abstract Words 
4. Arithmetical Requires solution of mental arithmetic 
Reasoning problems. 
5. Proverbs Interpretation of proverbs'and fables. 
6. Ingenuity Solution of problems requiring "mental 
Е тапіршайоп.” 
7. Memory for Sen- Tests ability to reproduce rather long and 
tences B involved sentences heard once. ^ 


8. Reconciliionof Must tell how words denoting opposite 
states are alike. Tests ability to grasp ab- ` 


Opposites 
stract relations. 


« 
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and good judgment. Memory for numbers and for sentences 
recurs throughout the scale. Questions dealing with specific facts 
learned in and out of school are excluded, but many common- 
knowledge questions are included on the reasonable assumption 
that what a person has learned in everyday living is a good index 
of what he can learn—and will learn—later on. Some Stanford- 
Binet test materials are shown in Figure 3-1 (facing page 54). 


Scope. The placement of test items at a given age level was 
made to depend on the responses of one hundred children at each 
age level below 6, two hundred children at ages 6 to 14 inclusive, 
and one hundred children at. ages 15 through 18. In all, about 
3,000 children constituted the standardization group. 

Terman and his co-workers selected children whose parents 
constituted a good cross section of occupational levels in the 
United States for the year 1930. The Stanford-Binet, like the 
original Binet Scale, is an age-scale. (See page 32.) It begins at 
2 years and items are grouped at one-half year intervals (at 2, 
2%, 3,3%, 4, 4%) up to 5 years. Mental growth at the lower 
age levels is so rapid that the authors of the scale thought it wise 
to narrow the gaps between age levels over this range. From 5 
years to 14, test items are grouped by year intervals; and beyond 
14 there is an average adult level and three superior adult levels. 
The Stanford-Binet is most useful over the age range from about 
б to 14—that is, over the elementary grades. 


Scoring. The Stanford-Binet assigns a mental age (MA) toa 
child in accordance with his ability to progress up the age scale. 
As shown in the examples on page 51, two children may earn the 
same MA on the Stanford-Binet in different ways. 

James, who is 9-3 or 111 months old, earns ап MA of 8-10, or 
106 months, by scattering his answers up the scale from age 
VII to age XIII. Robert also earns an MA of 106 months, but 
does not scatter as much as James. MA is a measvre of mental 
maturity or status. Children differ in the way in *which they 
answer the test items, but by and large a child comes out with an 


e 
А 


وڪ - د 


1 


The Concept of General Intelligence = iol 


Test Record of James Brown, chronological age 9-3, 
or 111 months* 


Tests Passed Months Credit Total Credit 
Year Level and Failed Per Test @ Year Month 
VII all passed th 

УШ four passed 2 months 8 
IX three passed 2 months 6 
X two passed 2 months 4 
RI one passed 2 months 2 
XII one passed 2 months 2 
XIII all failed 0 


* 'The expression 9-3 means 9 ycars and 3 months. 


MA — 8-10 or 106 months 


James’ IQ = MA/CA x 100 = 106/111 x 100 = 95 


^ 
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Test Record of Robert Green, chronological age 8-4, 
or 100 months 


Tests Passed Months Credit Total Credit 
Year Level and Failed Per Test Year Month 
VII all passed 7 

VIII five passed 2 months 10 

IX four passed 2 months 8 

X two passed 2 months 4 

XI all failed 0 

7 2 

MA = 8-10 or 106 months 
Robert's IQ — 106/100 — 106 (decimal dropped) 


MA which indicates his ability to perform. mental-manipulative 
tasks like those of the Scale. 

The intelligence quotient, or IO, is found by dividing the 
child's MA by his CA (chronological age) апд, is a measure of 
brightness Or “dullness. James has an IO" "of 106/111, or 95, and 
Robert w һо% 11 months vounger, has an IQ of 106/100, or 106. 
Both boys have the same mental maturity, but Robert is brighter 
than James because he has reached the maturity level of 8-10 
at an carlier ace. Fhe two measures, MA and IQ, arc comple- 
Mentary, each providing distinctive information. A child of 8 
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and a man of 40 may each earn an MA of 8 years on the Stanford- 
Binet (be of the same mental status in terms of the tests). But 
the child has an IQ of 100 (8/8) and is normal, whereas the man 
is feebleminded, with an IQ of approximately 53. (Read from the 
tables in the Manual.) * 

The IQ is a developmental ratio which inevitably loses its 
value as a child grows older and mental maturity is approached. 
"There is little difference in mean performance on the Stanford- 
Binet at ages 15, 18, and 20, and a correction table is provided 
in the Manual which adjusts the CA divisors in order to make the 
older person's IQ comparable to that of the child. There is no 
specific age at which intelligence can be said to “mature” or 
reach its peak, but 15 is taken somewhat arbitrarily to be the MA 
of the average adult on Stanford-Binet. For any person over 16, 
therefore, the corrected divisor is 15 years. The highest MA 
which can be earned on the Stanford-Binet by passing all of the 
tests in the Scale is 22% years. This MA yields a maximum ІО 
for adults of 152—found by dividing 273 months by 180 months 
(that is, 15 years). 


STANFORD-BINET IN THE SCHOOLS 


"The evaluation of pupils from their school grades or from sub- 
jective impressions of cleverness or brightness is often quite 
misleading. A teacher may describe a conscientious, amiable girl 
of ten who is one year overage for grade as “bright” when her 
IQ turns out to be relatively modest. Contrariwise, a rude, in- 
attentive youngster may be rated as “about average” or cven 
below average when his 1Q is in reality considerably above 
normal. Judgments of intelligence are always influenced by 
personality traits and social behaviors. It is not strange, there- 
fore, to find that two pupils must in general differ by as much 
as twenty points of IQ before a teacher is forced to lay other 


^ * See Terman, L. M., and Merrill, M. A. Measuring Intelligence. New York: 
Houghton, Mifflin Co., 1937. Tables, pp. 315-450. 
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criteria aside and admit that the badly behaved. youngster is 
| brighter than the courteous, hardworking one. ў " 
Teachers should knów certain facts about the IQ, what it is 
| and how best it can be used, in order to make maximum use of 
the informátion provided by the test. More specifically, the 
classroom teacher should know (1) the range of IQ's to be ex- 
pected in the school population, (2) the dependence to be placed 
on the IQ as a measure of intelligence, (3) to what extent the 
| test has diagnostic value, and (4) the limitations of the IO and 
= , the precautions to be observed in making interpretations based 
" on it. These topics will be considered in the following sections. 


e 


Range of IQ's in the School Population o 


The frequency polygon in Figure 3-2 shows the distribution 

+ Or spread of IQ's for the nearly 3,000 children from 2 to 18 
years old who made up the standardization sample. The fre- 
quency polygon is close to the normal curve model (page 17). 
IQ's center at 100 and range about equally above and below this 


FIGURE 3-2 Distribution of IQ's on tbe Stanford-Binet Scale 
for Nearly 3,000 €bildren, 2-18 Years Old 
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From Ferman, Lewis M., and Merrill, Maud A., Measuring 
Intelligence. Reproduced by permission of Houghton Mifin 
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value. The c of the IQ distribution is about sixteen points 
(exactly 16.4). This means that the middle 2/3 of school chil- 
dren will earn IQ's between 84 and 116. About 1/6 of the 
children will have IQ's above 116 and 1/6 will have IQ's below 
84. See Figure 2-7. The percentage of school children who can 
be expected to occupy the different IQ levels may be summar- 
ized as follows: 


TABLE 3-2 


Numbers of Children in the School Population to Be 
Expected at Various IQ Levels 


Percent of Children 
10 Level Description in Each Category 


130 and aLove Superior or gifted 3-5 
110 - 129 Above average to high 25. 
90 — 109 Average or normal 45 – 50 
70 — 89 Low normal to dull 20 - 25 

Below 70 Dull to feebleminded 2-3 


The number of children found in any group (especially in the 
two extreme groups) will vary somewhat with the social and 
economic conditions of the community and with the standards 
set up for defining the different intelligence levels. 

The IQ is useful in setting educational expectations. Suppose 
that William Butler, a fifth-grade pupil in a large school system, 
has a chronological age of 10-2 and a Stanford-Binet IQ of 116. 
William reads at fifth grade level, is somewhat above average in 
his other subjects, and is excellent in arithmetic. He is a quict, 
well-behaved boy’ who seldom becomes angry or annoyed. 
William makes friends readily ard is accepted as a member of 
his group. What are Wiiliam’s educational expectations? 

Table 3-3 will be of help in answering this questiür: William 
falls in the upper 16 per cent of school children. He should have 
no trouble completing elementary and high school. If he is in- 
dustrious, emotionally stable, and has intellectual interests, 
William may be encouraged to go to college. It might be wise 
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to advise a college of not too high standards if William is lacking 
in self-confidence, and to enlist his parents’ enthusiastic support. 

„Mary, age 12-0 with an IQ of 83, presents a very different 
picture from that of William. Mary is doing barely passing work 
1n the fifth grade, though she is about two years overage for the 
grade, Since her MA is not more than 10 years," she is perhaps 
doing all we can reasonably expect of her. It would be manifestly 
° Since MA/CA = IQ, Mary's MA is 12 X .83, or about 10 years. 

TABLE 3-3 


Educational Expectation in Relation to IQ L 


evel 


IQ Leve! 
(Stanford-Binet) Educational Expectation 
120 + Can do acceptable work in a first-class college if properly 
motivated. 
115 — 119 Should do acceptable but not outstanding college work. 
best in a small college where the 


Would probably do 1 
work is individual and standards not too high. 


h school, and may do well in the. less 


105 — 114 Should complete hig | с 
ses. Will have trouble with science 


difficult college cour: 
and mathematics, 

This group constitutes about 50% of the elementary 
school population. If not retarded by illness or other 
causes should complete the eighth grade on schedule. 
Some of these pupils will do fairly well ir high school. 


во — 89 Usually one to two years over age for grade. Acceptable 
high school work very unlikely for IQ's below 90. A 
child of IQ 80 will compete the eighth grade—if at all— 
two-three years behind schedule. j 

These children may reach the fifth grade. Will rarely 
less given much individual attention. 1 


90 - 104 


75 - 79 
»^' go beyond uni 

If one of these children reaches the fifth grade he will be 
„14-15 years old. Unable to do fifth-grade work; but be- 
of chronological age is likely to be pushed ahead 


ў cause 

р Б 
after repeating each grade two or three times. May be 
promoted because of age far beyond his mental capacity. 


2 


Below 75 
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unfair to scold Mary and insist that she "try harder." Mary's 
.educational expectation (see Table 3-3) is no higher than the 
eighth grade, if that. 


Stability of the IQ 


When a second form of the Stanford-Binet is administered to 
а child, this second IQ will often vary somewhat up or down 
from the first determination. Norman's IQ, for example, may 
be 109 today, whereas it was 112 six months ago, and may very 
well be back «o 106 six months hence. The stability of a test 
. score when the test is repeatéd or another form given, is called 
the reliability of the test (page 28). Stanford-Binet is one of 
our most dependable mental examinations, with reliability co- 
efficients which are usually well over .90 (page 29). Despite 
this fact, fluctuations in individual IQ's can still be expected 
when the test is repeated or a second form administered. 

The reliability of a test is conveniently expressed by the stand- 
ard error (SE) of a test score (page 30). The SE gives the allow- 
able (onemight almost say the inevitable) changes to be expected 
when a second form of the test is given. The SE of the Stanford- 
Binet IQ is four to five points* for IO's between 90 and 110. 
The SE is slightly higher for high IQ's and somewhat lower for 
low IQ's. Expressed in terms of chances or probability of change, 
a SE of five points means that the odds are roughly 2:1 that an 
IQ of 102, for exampie, will not be higher than 107 (102 + 5) 
nor lower than 97 (102 — 5) on retest. The SE represents the 
amount of fluctuation to be expected in most cases. The change 
in a few individual cases may be somewhat greater than five 
points ог somewhat less than five points. Fluctuations in IQ from 
time to time arise from many causes: changes in ike testing 
situation and changes in the child being tested. When a child's 
mentàl or physical health or his home or school environment 
‘change radically between tests, fluctuations in IÑ can be ex- 


* When SEiq = 16 V 1-90, we have 5 as the approximate value of the SE. 
This is « slight overestimation, as the reliability coefficient is usually above 90. 
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pected. Mental measurement is never as precise as physical meas- 
urement: a child is a much more variable “object” than, say, a 

jece of metal. Changes in IQ from one test to another rarely 
shift a child from one classification to another, however (sce 
Table 3-3)—that is, from normal to superior or from dull to 
normal. The consensus, in fact, is that the IQ is extremely hard to 
change and that we can accept an IQ when expertly determined 
as a reliable appraisal of a child's general mental level. 


The Stanford-Binet IQ in Diagnosis of Child Behavior 


Children who achieve the same,mental age will differ in JO 
when their CA's differ (page 51). Furthermore, even when the 
IQ 4s the same, two children may differ sharply in various 
aspects of mental development, as shown by the sorts of tests 
passed and failed and by the degree of scatter over the scalc. 
The Stanford-Binet is primarily a standard test-interview de- 
signed to furnish a cross-sectional view of a child's intellectual 
capacities—that is, to give the levei at which the child normally 
functions. At the same time, the school psychologist, in writing 
an account of a child's performance on the test, will usually 
note irregularities in development and learning ability, and these 
observations provide the teacher with valuable clues to an under- 
standing of the child. Visual handicaps, inco-ordinations, and 
other physical handicaps may be noted; so also may be noted 
deficiencies in arithmetic skills, in word comprehension, in rea- 
soning, and in current information. The sub-tests of the Stariford- 
Binet call for fairly specific performances, and are not sufficiently 
numerous or comprehensive to permit the final judgment that 
“John is weak in number work, but excellént in rote memory,” 
or that “Sarah’s verbal facility far exceeds her manipulative 
skills.” But the pattern of a child's responses and the relative 
strengths and weaknesses displayed on. groups of items will pro- 
vide useful information. 

Parents are often puzzled when a child who is a discipline 
problem is, at the sáme time, described as above normal in in- 
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telligence. The reason, of course, is that Stanford-Binet is not a 
measure of social intelligence or of emotional stability but of 
general verbal or abstract level (page 46). At the same time, the 
observant psychologist will note and record characteristic emo- 
tional and. temperamental behavior displayed as the child takes 
the test. The rude or indifferent youngster, who doesn’t care and 
who doesn’t co-operate; the spoiled and petulant “brat,” who 
gives up and pouts at the first failure; the timid and insecure 
child, who inquires eagerly “Is that right”? after answering each 
item—all these. children reveal their distinctive personality traits 
by the manner in which they tackle the test. Standards of be- 
havior in the home, ideals of conduct, values, and attitudes are 
often exhibited clearly, if indirectly, in the course of a mental 
examination. At Year VII, for example, is the question “What's 
the thing to do if another boy (or girl, depending on the sex of 
the examinee) hits you without meaning to do it?” The child 
who is immature socially or reared in a rough-and-tumble com- 
munity will answer promptly “Hit him back.” The 7-year-olds 
who are better trained in acceptable social practices will qualify 
their replies, or suggest that forgiveness may be in order if the 
blow were truly accidental. 

The following case histories will illustrate how qualitative 
analysis of a child’s test performance can help the classroom 
teacher who refers him to the psychologists.” 

Case I. SM, a boy; CA = 10-2, MA = 8-2, IQ = 80. 

This boy was referred by his teacher because of unsatisfactory 
work in the fourth grade. He is a good-looking, polite lad. 
normal in appearance and in social manner. Anyone unacquainted 
with bis school work might judge SM. to be average in intelli- 
gence, or perhaps above average. On the Stanford-Binet, SM’s 
vocabulary was childish with definitions in terms of use. He 
passed the vocabulary test only at Year VI. His answers to the 
picture and verbal absurdities were halting, poorly phrased, and 
uncomprehending. He had inaccurate and.meager responses to 
“seeing relations” items—differences and similarities. He was 
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poor in number relations; his co-ordination and ,rote memory 
were fair. SM is a dull boy who may reach the eighth grade, buc 
is not likely to go beyond. It was recommended that SM under- 
take vocational training. 


Case II. RW, a girl; CA — 11-2, MA — 11-3, IQ — 100. 

RW is a well-developed girl, apparently calm and self-pos- 
sessed. She was referred by her teacher because of poor work in 
the sixth grade; she is described as being inattentive and given 
to daydreaming. RW seemed indifferent to the test, but did not 
refuse to co-operate. She often asked that a question be repeated, 
and the examiner suspects slight deafness. She became more in- 
terested as the test proceeded, especially when she got the answers 
to several questions. Her vocabulary is at Year X, but her verbal 
ability is about normal, as shown by her ability (о deal with 
pictures and verbal absurdities, name words, define abstract terms, 
and deal with similarities. Her attention was somewhat variable 
and she was easily distracted. She showed uncertainty in using , 
number relations, as, for example, in making change. RW is 
normal in intelligence and should be able to do satisfactory work 
in the sixth grade. It is suspected that her daydreaming is, in 
part, a consequence of puberty. It was recommended that the 
classroom teacher check on RW’s fricnds, outside activities and 


home conditions. 


Case III. HP, a boy; CA = 6-5, MA = 9-6, IQ = 148. 

The second grade teacher is not sure what to do with HP; he 
seems to know everything she is teaching. HP’s father is a 
prominent surgeon. This boy entered school at 6-1 and was put 
in the second grade. He is well mannered, normal in play and in 
social activities, and gets along well with his classmates. HP 
whizzed through the tests for Years vi, VII, and VIII. His 
vocabulary is at Year X. He defined an orange as "a citrus fruit, 
round and yellow, comes from Florida.” His co-ordination is not 
up to his verbal level, but his memory and perception of. differ- 
ences and Шепебѕеѕ are excellent. HP-is a very bright youngster. 
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He should be ready for high school by age 12 or earlier. He 
should now be in the fourth grade, if he is ready for it socially. 
If promotion is not feasible, a program of outside reading and 
some special attention is suggested. 


Precautions to Be Taken in Interpreting the IQ 


Some of ‘the factors which may influence а child's IQ have 
been touched upon in preceding paragraphs. To what extent the 
IQ is an index of "innate ability" will depend upon the co-opera- 
tion and motivation of the examinee, and upon how expertly the 
test has been administered. Several conditions which may affect 
the reliability of an IQ are the following: 


Physical causes: Sensory defects, deafness, poor eyesight. Malnutrition 
and illness arc also important. 

Examiner: The personal equation of the examiner may be crucial. Mental 
test examiners who are poorly trained, have harsh and unpleasant 
voices, peculiarities of manner or dress, or who are supercilious or 
arrogant in their relations with the child get poor co-operation and 
uncertain test results. 

Testing conditions: Test results are likely to be unreliable when the 
examination room is barc, too cold or too warm, or overdecorated. 
Coaching on the tests must always be watched for, since the tests have 
been widely distributed. 

Environmental surroundings: The degree of stimulation received in the 
home, the school and the community will markedly affect the test 
performance. Children from homes broken by divorce or by drunken- 
ness will often show IQ increases of as much as twenty points after 
several months of kind treatment. On the other hand, children from 
goed homes who have" been transferred to a deprived and restrictive 
environment (as, for example, in war) may show sharp drops in IQ. 


Because of the many factors which may affect its determina- 
tion, а Stanford-Binet IQ should not immediately be denounced 
as worthless should there be a considerable shift in a second 
rating. Instead, a drastic change in IQ should be taken as à 
challenge, and the causes ferreted out if possible. The neglected 
dull normal child when taken into a good home will often show 
an increase in measured IQ, as will a child adopted into a good 
family. By the same token, a normal child will do poor school 
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work if he is insecure and unhappy. It seems very unlikely that 
even sharp changes in IQ reflect a real alteration in a child's 
aptitude. At least all of the environmental factors should be con- 
sidered before this conclusion is reached. 


Constancy of the IQ over the Age Range 


Suppose that Bob White, who is 7 years old, has an MA of 8 
years on Stanford-Binet and an IQ of 8/7, oz 114. When Bob 
is 14 years old, his MA must be 16 years if his IQ is to remain 
constant at 114 (16/14 = 114). The IQ is a measure of bright- 
ness or dullness relative to a child's age group. Hence, should an 
IQ fluctuate widely—as, for example, from 114 to 85 or to 140— 
the ratio MA/CA becomes valueless. We have said earlier (page 
56) that when the IQ of a child has been determined by an 
expert, it is a highly dependable index. But whether the IQ 
remains constant over the years from 6 to 14 (over the elemen- 
tary school, for example) will depend, for one thing, on the way 
in which the test has been constructed. This question is appro- 
priate, therefore: "Is the Stanford-Binet so constructed as to 
make a constant IQ probable or even possible?” 

There are three conditions which an intelligence test must 
meet if the IQ, defined as the ratio, MA/CA,* is to remain 
constant over the age-scale. These are: 

1. Increased spread of МА” (larger SD's) as we go up the 
age-scale. 

2. Homogeneity of mental function over the age rangé cov- 
ered by the scale. Homogencity means that the test measures the 
same "intelligence" for example, from age 2 to age 18. 

3. Zero correlation between chronological age and IQ. 
These conditions are met to a high—though not perfect— 
degree by the Stanford-Binet. "They are not met, even approxi- 
mately, by most group intelligence tests (page 97). Let us 
examine each condition further. 

1. The SD of the Stanford-Binet MA distributions increases 


* The IQ may also be desined as a standard score (p. 39). The conditions for 
IQ constancy, given above, apply only to age-scales. 
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fairly regularly with chronological age. At Year VII, for ex- 
ample, the SD of the mental age distribution is 1 year; at Year X 
the SD is 1.6 years; at Year XIII it is 2.3 years; and at Year XVI 
it is 2.6 years. This means that if Bob White has a CA of 7 years 
and is оле SD above the mean for his age (that is, at 8 years), 
his IQ will be 8/7, or 114. If Bob maintains his rate of mental 
growth, at age 10 his MA will be 1 SD above the mean, or at 
11.6 years (10 + 1.6). Bob's IQ is now 11.6/10 or 116. At year 
13, should Bob stay 1 SD above the mean, his MA will be 15.3 
(13 + 2.3) and his IQ 117. And at age 16, should Bob maintain 
his rate of growth, his MA should be 18.6 (16 4- 2.6) and his IO 
18.6/16 or 116. Figure 3-3 shows that when a child maintains 
an accelerated rate of growth, his IQ (like Bob's) will remain 
approximately constant—that is, within 2 to 3 points. 
FIGURE 3-3 Age-Progress Curves for tbe Stanford-Binet Scale 


[Note that tbe spread of MA’s becomes greater with increasing 
chronological age.] 


Mental age 


8 12 16 


Д Chronolegical age 


Figure 3-3 shows that when a child is below the mean for his 
age, his IO will again remain approximately constant should he 
maintain his slower rate of growth. If a child has an MA of 6and 2 
CA of 7—is 1 SD below the mean for his age—his IQ will be 
6/7, or about 86. Should this child maintain his slower rate of 
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growth, at Year XIV his MA will be 11.8 (14-2.2) and his IQ 
will be 11.8/14, or 84. It is the increasing spread of MA’s with 
increasing CA whichkeeps the ratio MA/CA constant within 
2 to 3 points, always provided the child maintains a constant rate 
of growth. (See Figure 3-3.) 

2. Statistical analysis has shown that the correlation between 
successive MA levels is very high, and that the Stanford-Binet 
is measuring essentially the same "intelligence" as we go up the 
age-scale. 

3. When a child reaches the upper ’teens, mental growth 
changes as shown by the Stanford-Binet fail to keep pace with 
chronological age. When this happens, the curves in Figure 3-3 
lose altitude and bend over to become parallel with the baseline. 
Failure of the MA to increase with CA leads inevitably to a fall- 
ing IO among older children and, if uncorrected, to a negative 
correlation between CA. and IQ. (Negative correlation follows 
because the CA continues to increase, whereas the IO no longer 
does—see page 52.) To overcome this fault in the age-scale, - 
the authors of Stanford-Binet provide a steadily decreasing CA 
divisor from age 13 and above. This procedure bolsters up the 
IQ by lessening the denominator (CA) and thus balancing the 
decreasing numerator (MA). This means that a child's IQ does 
not bave to decrease as the child grows older—and that there is 
no systematic correlation (positive or negative) between CA 


and IQ. 


THE WECHSLER-BELLEVUE 
INTELLIGENCE SCALE* 


Description. The Stanford-Binet is sometimes used to measure 
the intelligence of adolescents and young-adülts, but it is not well 


E 3 Р 

* The Wechsler Adult Intelligence Scale(WAIS), published in 1955, repre- 
sents a revision and restandardizing of the Wechsler-Bellevue Intelligence Scale 
(W-BIS). WAIS makes use of the same principles of construction, scoring and 
IQ derivation found in the older scale, and the two are essentially the same test. 
W-BIS is de/cribéd here rather than WAIS because it is better known and is 
still widely used. 
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suited to these groups, since the items of the test were selected to 
appeal primarily to children. A better examination for measuring 
adult intelligence is the Wechsler-Bellevus Intelligence Scale, an 
individual intelligence test designed especially for adults. The 
Wechsler-Bellevue is, on the whole, a well-made examination. 
The group used in standardizing the test battery—that is, the 
group upon whose answers the scoring and norms depends— 
consisted of about 1700 persons chosen from a larger group of 
3500. The sample was chosen to represent the occupational dis- 
tribution of the adult population at the time of the 1930 census. 
The sample is adequate in size, but the fact that it was drawn 
mostly from New York City and New York State renders ques- 
tionable its claim to represent the country as a whole. 

'The Wechsler-Bellevue Scale consists of two parts, a Verbal 
Scale and a Performance Scale. Language is required in the first 
scale, but the tests in the second part demand no language in the 
actual solution of the problems. Directions, however, are given 
orally. What is called the Full Scale is a combination of the 
Verbal and Performance sections. 'The Verbal Scale is made up 
of five tests, as follows: 


VERBAL SCALE 


1. General Information: Twenty-five questions covering a wide range of 
common information and dealing with facts which all normal adults 
have presumably had a chance to learn. Questions are graded in 
difficulty from easy to hard. 

2. General Comprehension: Ten questions and two alternates, in each 
of which the examinee is asked to tell what should be done in certain 
situations, or why certain practices should be followed. The questions 
are planned to measure practical judgment, common sense, and 
understanding. j 

3. Arithmetic Reasoning: Ten mental arithmetic problems. Each problem 
is presented orally and must be solved without the use of paper or 
pencil (“in the head"). 

4. Digits Forward and Backward: Memory span for digits presented one 
at a time and ranging in number from 3 to 9. In the second part of 
the test, examinee must give the list of numbers ir: reverse order. 

5. Similarities: Twelve word-pairs, each pair alike in some way. The 
examinee must say in what way the two words 2re alike. 
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6. Vocabulary (Alternate): A list of forty-two words graded in diffi- 
culty to be defined orally. 


There are five tests, іп the Performance Scale, as follows: 


PERFORMANCE SCALE 


1. Picture Completion: Fifteen cards, each containing a picture from 
whieh some part is missing. The examinee must give the missing part. 
2. Picture Arrangement: Six sets of pictures, each set containing from 
three to six separate pictures. The examinee is to arrange the pictures 


in any given set so that they tell a story. 
3. Object Assembly: Three form-boards—the Manikin, the Profile, and 


the Hand. The parts of each form-board must be put together, much 

as in a jigsaw puzzle, to form a complete object. 

4. Block Design: Sixteen small cubes (blocks) colored red, white, and 
red-and-white on the sides. The blocks are to be arranged to match 
seven designs presented on test cards. The designs require from four 
„го sixteen cubes. 2 

5. Digit Symbol: A well-known association test. Nine numbers are 
matched with nine symbols in accordance with a key. 

Samples of the items from the performance part of the 
Wechsler Adult Intelligence Scale are shown in Figure 3-4 
(facing page 54). These tests are “performance” in the sense 
that the examinee in solving the problem must make use of 
diagrams, pictures, form boards, and cubes. But “ideas” —that 
is, symbols—are certainly not excluded. Wechsler's performance 
tests, therefore, are measuring abstract rather than motor or 
mechanical intelligence. 

Scope. The Wechsler-Bellevue Scale provides scores in the 
form of “IQ’s.” Norms run as low as 10 years, but the scale's 
principal application is over the age range from about 29 to 60 
years. Beyond 60 years, Wechsler-Bellevue IQ’s are not always 
dependable, owing in part to the small samples at advanced age 
levels. But these IQ's may be taken as uséful estimates of general 
intelligerice. Age-level scores on the Full Scale (Verbal + Per- 
formance) show a gradual decline after 20, the drop 1n score 
from age 20 to age 60 being about 20 per cent. lal 

Scoring. F ollowing the directions given in the scoring guide 
(Manual), the examiner first adds up the items done correctly 
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(speed is sometimes a factor) for each of the ten sub-tests. Scores 
on each sub-test are then converted into standard scores (page 
38), in which the mean for the 20-34 age group is set at 10 
and the SD at 3. Conversion of the separate sub-test scores 
into a common standard score scale allows the examiner to com- 
bine the tests into a single index and thus to compare the ups and 
downs in performance from sub-test to sub-test. 

The Wechsler-Bellevue does not provide mental ages, since 
the concept of mental age, though useful with children, has little 
meaning when applied to normal adults. The Scale does provide 
for an IQ (called a “deviation IQ”), which is essentially a stand- 
ard score. There are three IQ’s, one from the Verbal, one from 
the Performance, and one from the Full (combined) Scale. In 
each case these deviation IQ's are found in the following way: 
Scores on the sub-tests (10 for the Full Scale) are added and. the 
total is converted again into a standard score, this time with a 
Mean = 100 and a SD = 15 (page 39). At each age level (for 
example, at 30, 40, 50) the mean score got from the sub-tests is 
set at IQ 100. A score which is one SD above the mean at amy 


age level then becomes an IQ of 115. Putting the IQ for each age ' 


level at 100 adjusts for the steady fall in total test score with age. 
Standard score IQ’s or “deviation IQ's" below 100 denote the 
same degree of retardation with reference to one's age group. 
For example, we read from the Manual that a man aged 35 who 
achieves a score of 75 on the 10 tests of the Full Scale has an IQ 
of 92—is slightly below the mean of his age group. The same 
score of 75 becomes an IQ of 96 at age 45 and an IQ of 100 at 
age 69. This means that a total score of 75 on the 10 sub-tests is 
"normal" (or “at age") for age 60 and hence receives an IQ of 
100. But the score of 75 is below the mean for the younger 
groups. Again, the examinee who earns а score of 90 (Full Scale) 
has an 1Q of 109 if he is 57, an IQ of 102 if he is 37, and an IQ 
of 94 if he is 22 years old. 

To summarize, Wechsler-Bellevue Scale IQ's are converted, 
or standard, scores in which the mean is always 100-for each age 
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grcup and the SD is 15. Wechsler-Bellevue IQ's have the same 
meaning from one age to another in the sense that an IQ of 105 
or of 86 implies the same relative superiority or inferiority to the 
examinee's age group. The Wechsler-Bellevue IQ is a standard 
Score, whereas the Stanford-Binet IQ is a ratio, MA/CA. The 
two indices are highly correlated, but are not equivalent. To 
avoid confusion, it helps to write “Wechsler-Bellevue IQ” when- 
ever the deviation IQ is meant. Both the Wechsler-Bellevue IQ 
and the Stanford-Binet IQ are measures of abstract intelligence 


(page 46). 


THE WECHSLER-BELLEV UE SCALE 
е IN THE SCHOOLS 


The Wechsler-Bellevue Scale has been widely used in the 
individual study of adolescents and older students for whom the 
content of the Stanford-Binet is inappropriate. The test is most 
valuable, therefore, to teachers in the upper grades and in high 


Schools and technical schools. 


Range and Stability of Wechsler-Bellevue IQ's 

The range of IQ's in the general school population is about the 
same for the Wechsler-Bellevue Scale as for the Stanford-Binet. 
Table 3-3 will serve, therefore, as a guide in the interpretation of 
test score. Table 3-3 may be taken álso as providing a statement 
of the educational expectations of older students when we.know . 
the Wechsler-Bellevue IQ. The reliability of the Wechsler- 
Bellevue Scale, as given by its standard error, is approximately 
five points. Hence, the 1O from this scale ‘has about the same 


stability as the Stanford-Binet IQ. 


The Wechsler-Bellevue Scale in Diagnosis 1 

The Ful; Scale—like the Stanford-Binet—yields a measure of 
а student's general mental level and is often used to provide this 
information. The Wechsler-Bellevue, however, has also been 
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widely employed in mental hospitals and clinics for the diagnosis 
of abnormal behavior. The Scale has been useful in the study of 
variations of performance in schizophrenia and other mental dis- 
eases, in senile deterioration, and in assessing the effects of brain 
damage and the results of brain surgery. The fact that there are 
eleven separate tests (six Verbal and five Performance) in the 
Full Scale has led clinical psychologists to attempt to discover the 
relative efficiency of various mental functions from irregularities 
in test performance. 

The diagnosis of differential abilities (strengths and weak- 
nesses) from thé sub-tests of the Wechsler-Bellevue must always 
be taken as tentative, though an examination of the different sub- 
tests may provide valuable clues. The various tests of the Scale 
are too short and too complex (in that they test overlapping 
abilities) to allow a sweeping judgment to the effect that “Bill 
has poor planning capacity and poor judgment” or that “Mary 
has a good memory and adequate concentration.” Observations 
of this sort are valuable only if made cautiously and taken in con- 
junction with other evidence. The Full Scale is a good index of 
present mental efficiency, and the difference between the Verbal 
and Performance IQ's may be significant of the academic vs. the 
non-academic “mind” (page 75). But judgments drawn from 
specific sub-tests with respect to strengths and weaknesses in 
memory, learning, perception, planning capacity, concentration, 
emotional blocks, and the like must be taken as suggestive rather 
than conclusive. 


WECHSLER INTELLIGENCE SCALE 
' FOR CHILDREN 


Description. The WISC, as it is called, is a downward revision 
of the older Wechsler-Bellevue to render the test more suitable 
for young children. There are ten sub-tests and two alternates 
(twelve in all) in the WISC. The sub-tests have the same form 
and cover the same content as the Wechsler-Bellevue, except 
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that easier items have been added. Tests are grouped into five 
Verbal and five Performance as follows: 


Verbal Scale Performance Scale 
General Information Picture Completion 
General Comprehension Picture Arrangement 
Arithmetic Block Design 
Similarities н Object Assembly 
Vocabulary (Digit Span) Coding (or Mazes) 


The Wechsler Intelligence Scale for Children differs in several 
respects from the Wechsler-Bellevue. In the: Verbal Scale, 
Digit-Span proved to be less satisfactory than the other tests and 
hence became an alternate, Vocabulary being substituted. In 
the Performance Scale, coding is a somewhat easier version of the 
Digit-Symbol test. Mazes are sometimes given instead of coding, 
but the second test is usually preferred, since it takes less time 
to administer. The maze test is the only teet not found in the 
Wechsler-Bellevue. 


Scope. The WISC is a better made test than the Wechsler- 
Bellevue. To provide norms, one hundred boys and one hundred 
girls were tested at each age level from 5 to 15. Children in the 
standardization sample were drawn from eleven states and from 
three institutions for the feebleminded. The sample was carefully 
checked to give a cross section of geographic areas, urban-rural 
groups, and occupational levels of parents. 


Scoring. As. was true of the Wechsler-Bellevue, all sub-tests 
were first converted into standard scores in a distribution with 
M = 10 and SD = 3. Tables are provided for reading scale 
score equivalents to raw scores for eacli 4-month period from 5 
to 15 years. These equally weighted sub-test scores are added 
and then again converted into “deviation IQ's," with Mean = 
100 and SD =.15 (page 39). Verbal, Performance, and Full 
Scale IQ's may be read from appropriate tables in the Manual. 


, 
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Approximately 50 per cent of school children can be expected 
to earn WISC IQ's between 90 and 110. 


Differences Between the WISC and the Stanford-Binet. The 
WISC differs from the Stanford-Binet in several important ге-, 
spects. First, all items of a given sort in the WISC are organized 
into sub-tests instead of different kinds of items being placed at 
successive age levels. WISC is a point scale rather than an age 
scale. Second, the WISC IQ is a deviation IQ—a standard score 
in a distribution with Mean = 100 and SD = 15—whereas the 
; Stanford-Binet IQ is a developmental ratio or MA/CA. The two 
1О are closely related (the correlations between the two sorts of 
scores run from .80 upward), but they are not identical (page 
66). The SD of the Stanford-Binet distribution of IQ's is 15, as 
against the WISC SD of 15; and some of the difference between 
the two IQ's is due to the greater spread of the Stanford-Binet 
IQ's. Furthermore, the two mental examinations differ in length, 
variety and difficulty of items. Finally, the WISC provides for 
three IQ's—a Verbal, a Performance and a Full Scale. There is 
only one IQ from the Stanford-Binet, based upon all of the tests 
in the scale. 


THE WISC IN THE SCHOOLS 


Both the WISC and the Stanford-Binet are widely used with 
school children, and in most cases there is little to choose between 
the &wo examinations. Many psychologists regard the Stanford- 
Binet as more satisfactory for use with very young children, 
since the WISC is not always easy to administer when the child 
is under seven years old. WISC takes less time to give and to 
score than does Stanford-Binet, and some examiners prefer it 
over the age range of the elementary school. The WISC Full 
Scale IQ has a higher correlation with Stanford-Binet IQ than 
does either the Verbal or the Performance Scal? IQ. 
` Bright children tend to score higher on the Stanford-Binet than 
on the WISC, whereas dull pupils score higher on the WISC. 


1 
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The separate IQ's of the WISC (Verbal and Performance) are 
Die ru ee i. иза іп verbal and manipu- 

2 child (usually a boy) will do better on 
the performance tests of the WISC than on the verbal indicating, 
perhaps, greater aptitude for vocational than for academic sub- 
jects. A bookish youngster who reads a great deal may do much 
better on the verbal tests. The performance IQ is usually higher 
than the verbal in severely disturbed adolescents, and this differ- 
ence often appears also in younger dull students. From the 
manner in which the child handles the verbal tests, the expert 
examiner will often note evidences of insecurity as revealed by 
incoherence, verbosity, poor attention, and defeatism. Poor 
performance on the manipulative tests often reveals. inept plan- 
ning and defective co-ordination, whereas good performance 
shows concentration and adequate sensory-motor organization. 


Range and Stability of the WISC IQ's 


The range of WISC Full Scale IO's to be expecteu in the 
general school population, and the meaning of these “scores” are 


shown in Table 3-4. 


TABLE 3-4 
Intelligence Classification for WISC IQ's 
Percent 
IQ Ranges Classification in Each Group 
130 - very superior 2 
120 - 129 superior 7 
110 - 119 bright normal 16 
90 — 109 average 50 
80— 89 dull normal 16 
70- 79 borderline ^ 7 
6Mbelow mental defective 2 


It will be seen that these classifications correspond closely to those 


for Stanford-Binet IQ's. 
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Reliability cocfficients for the WISC are generally above .90. 
They arc higher for the Full Scale than for either the Verbal or 
Performance Scales. The standard error of a WISC IQ is 4-5 


points. 


MA's from the WISC 


The WISC does not ordinarily make usc of mental age, but 
when mental ages are required for clinical or for legal reasons. 
they can readily be determined. The Manual (Appendix E) pro- 
vides a table of “test-age equivalents to WISC raw scores." By 
reference to the table, we find the chronological age of a child 
for whom a given raw score îs typical (or average), and this is 
the MA cozresponding to the score. For example, a score of 12 
on the Comprehension test is achieved on the average by children 
who are 10-6 years old. Hence, a score of 12 in Comprehension 
has an equivalent MA of 10-6. The mean of the sub-test MA’s is 
computed (Mean Test-Age Method) or the median of the MA’s 
(Median Test-Age Method). Either of these determinations gives 
the final over-all MA. A closely equivalent method for determin- 
ing MA's from the WISC is by use of the formula MA — IQ X 
CA. A child who achieves an IQ of 110 and who is 8-2 years old 
has a MA of 110 X 8-2, or approximately 9-0 years. 


PERFORMANCE TESTS 
Development of Performance Tests 


Pérformance tests designed to measure general mental ability 
have been often used in the schools (1) as substitutes for the 
more verbal tests, and (2) as supplements to the Stanford-Binet 
and other linguistic scales. Performance and non-language tests 
must of necessity be employed with pre-school children and with 
the very dull. Such tests are useful additions to the Stanford- 
Binet or WISC in the mental examination of children with speech 
and language defects or children with visual and: auditory impair- 
ment. Batteries of performance tests have long been used in 


The Arthur Scale . 73 


psychological clinics and in institutions for the feebleminded. 
The classroom teaches should know about performance tests, 
though he will encounter them much less often than the WISC 
or the Stanford-Binet. 

The Pintner-Paterson Scale of Performance Tests (1917) was 
the first organized battery of manipulative and non-language 
tests. Widely used for many years, this scale has now been re- 
placed to a considerable degree by other batteries, based upon 
it. The Pintner-Paterson Scale consists of fifteen separate tests. 
The ten tests most often used (in what is called the Shorter 
Scale) include four form boards, three picture completion tests 
(of the jigsaw puzzle type), two object assembly tests, and 
one block-counting test. : 

Later performance scales are the Cornell-Coxe Performance 
Ability Scale (1934) and the Arthur Point Scale of Performance 
Tests (1930, revised 1947). These test batteries draw heavily on 
the Pintner-Paterson, but include, too, important additions and 
revisions. In addition to these test batteries, there are a number 
of other performance tests, of which a graduated series of mazes, 
the Porteus Mazes, is the best known. Widely used types of per- 
formance tests are the object assembly (page 65), various form 
boards, block counting, and block design. Two of these, block 
design and object assembly, are found in the Wechsler-Bellevue 
Scale. 

Norms are generally available for the individual performance 
tests, so that one may usc onc or more tests without having. to 
administer the whole scale. à 


The Arthur Scale n 

The Arthur Point Scale has been widely used over the age 
range of the elementary school. It is made up of performance 
tests taken from various sources; it was first published in 1930 
and revised later in 1947. The later.edition is a considerable im- 
provement over the original insofar as standardization is con- 
cerned, and the Scale is a good example of a performance battery 
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designed for children. Figure 3-5 (facing page 53) shows the five 
tests in the Arthur Point. 
"There are five tests in *he Arthur Point Scale, as follows: 


Knox Cube. The four cubes (see Figure 3-5) are tapped in a 
certain order by the examiner, for example, cube 1, cube 4, 
cube 2, cube 3. The child is told to imitate the tapping order. 
Tapping sequences become longer and more complex, until 
they can no longer be done by the child. 

Seguin Form Board. Ten common geometric forms (Figure 3-5) 
are to be fitted into the right apertures in.the board. 

Porteus Mazes. The child is told to trace the shortest path from 
the entrance to the exit in^a maze, not lifting the pencil from 
the paper. If he makes an error by crossing a line or entering 
a wrong pathway, he is stopped and given a second trial. Mazes 
increase in difficulty from 3-year level to adult. 

Healy Picture Completion II. As shown in Figure 3-5, the test 
shows successive scenes in a boy’s life during a typical school 
day. Small pieces or blocks have been cut out of the scene. The 
child must select the appropriate pieces from the box and fit 

/ them-in place. 

Arthur Stencil Design Test. The child must reproduce designs 
of increasing complexity. Standard designs to be copied are 
presented on cards. Each design can be reproduced by fitting 
together stencils in different colors on a solid white card. 
Several stencils are needed for the more detailed designs. 


Scope. The Arthur Performance Scale covers an age range 
from about 4 уеагѕчо maturity, but is used chiefly with younger 
children. The Scale is employed raainly as a clinical test supple- 
mentary to, or asa substitute for, the Stanford-Binet. 


Scoring. Scores on the sub-tests (based on accuracy and time) 
are first converted into point scores. These are combined and 
converted into mental ages. MA’s are chronoiogical ages which 
are typical for given combined scores. Thus if the average child 
of 10-0 scores 31 points, a score of 31 points becomes an MA of 
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10-0. MA is divided by CA to give a "performance IQ." These 
IQ's are not equivalent to the IQ's from verbal intelligence 
scales and are not to be so regarded. Arthur Scale IQ's should 
always be described as “Arthur Scale IQ's." 


Performance Tests in the Schools 


Correlations between scores on the Arthur Scale and the 
Stanford-Binet are fairly high (.50 or more). The two tests are, 
however, not measuring exactly the same functions, and hence 
the Arthur IQ is often used as a "performance supplement" to 
the Stanford-Binet IQ. Arthur IQ's are higher than Stanford- 
Binet IQ's when the latter are low, that is, below 90; and this 
discrepancy is especially striking when children are very dull. 
There is evidence that low performance test scores may be in- 
dicative of behavior problems and of emotional instability. This 
result probably grows out of the disturbed child's poor attention 
span, poor perception of relations and ineptitude in manual activ- 
ities. Emotional involvement may take expression in bizarre and 
unusual responses. 

For the classroom teacher the main value of a performance 
test lies, perhaps, in the fact that such tests (1) may reflect poor 
language development or lack of language training, and are (2) 
often indicative of cultural and educational handicaps. As 
pointed out on page 68, a comparison with verbal tests often 
reveals, for instance, children whose manual and manipulative 
skills (concrete intelligence") run ahead of their verbal facility 
("abstract intelligence"). Performance tests serve, too, to identify 
the shy and inarticulate child who is brighter than the verbal 
tests show. „Performance tests are not «specially useful with 
normal school children over 12 years of age and they rarely 
differentiate significantly among older bright children. 


Case Histories. The following brief case histories will illustrate 
how performance tests, when used together with verbal tests. 
may provide a better understanding of a pupil's capabilities. 


` 
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Recommendations in most cases must be tentative and subject to 
possible revision in the light of further information. 


Case 1. Donald B.: age, 10-2; Stanford-Binet IQ, 92; Arthur 
Scale IQ, 106. m 

Donald is doing poor work in the fifth grade. His father is a 
barber, his mother a clerk (part-time) in a store; neither parent 
went beyond the seventh grade. There are three other children 
in the family, all younger than Donald. There are few books in 
the home, but the family owns a TV set and a new automobile. 
Donald reads the sports page and the comics in the daily news- 
paper, but little else. He talks in brief sentences and is generally 
unresponsive in school. He^is a well-grown boy for his age, 1 
good athlete, and is well accepted by his classmates. He has 
never been a behavior problem. K 

Recommendation: Donald's performance IQ is fourteen points 
higher than his Stanford-Binet IQ. In view of his relatively 
meager abstract intelligence, this boy is probably doing as well 
as we can expect. He may get to high school, but will almost 
certainly not complete more than one year. Vocational training 
seems to be indicated. He will continue to have trouble with 
verbal subjects, but may be very successful at a skilled trade. 


Case II. Joan M.: age, 8-3; Stanford-Binet IQ, 126; Arthur Scale 
IQ, 109. 

Joan is doing excellent work in the fourth grade. Her problem 
is social rather than scholastic. Her father is dead, and her mother, 
a Widow, is a successful dress designer. Joan is an only child and 
is alone mach of the time. She reads a great deal but has few close 
friends and is often left out of class activities. She has a tendency 
to daydream and is shy and withdrawn. 

Récommendation: Joan’s low performance IQ, coupled with 
her high Stanford-Binet IQ, indicates a lack of experience with 
“concrete” activities, such as running, playing out-of-doors, 
skipping rope, dancing, and the like. This lack of opportunity t° 
develop manual skills is often found in children reared in a large 
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city. Joan's mother may be encouraged to meer with other class- 
room mothers, arrange parties, and invite Joan's classmates to 
her home. The aid of the physical education teacher in getting 
Joan into games should also be sought. The classroom teacher 
can often see to it, by suggestion and indirection, that Joan is 
included in class parties and out-of-class activities. 


Case III. Bob W.: age, 9-2; Stanford-Binet IQ, 104; Arthur Scale 
IQ, 115. 

Bob is doing satisfactory work in the fourth grade. He is shy 
and timid, with a slight tendency to stammer, especially when 
questioned. He is one of four children, the other three being girls 
all older than he. Bob's father is a successful lawyer, and his 
mother is a college graduate and a prominent club woman. The 
pafents have decided that Bob, as the only boy, is to be a pro- 
fessional man, preferably a physician (his grandfather was a well- 
known surgeon). They are dissatisfied with Bob’s marks, and 
are sure he is intelligent and that the teacher is to blame. 

Recommendation: Bob is clearly a normal boy. He is not 
bright, though he probably is brighter than his Stanford-Binet 
IQ indicates. The parents must somchow be reconciled to the 
fact that (1) Bob is not of professional caliber, and (2) a lower 
vocational goal (one within Bob’s intellectual grasp) will make 
for a happier boy and probably a much happier life. They must 
be urged not to scold the boy and thus make him ‘feel more 
inferior than he already does. This is a difficult problem, because 
it is the parents—not the child—who have to be “sold” on a 
different program from the one they have planned. 


SUGGESTIONS FCR FURTHER READING 


General: . s ‹ 
Anastasi, A. Psychological Testing. New York: Macmillan, 1954. 
Cronbach, L. J. Essentials of Psychological Testing. New York: Harper, 

1949, : 2: 
Freeman, Е. S. Theory апа Practice of Psycbological Testing (Rev. 


Edition). New York: Holt, 1955. 
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Specific: 
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Manual for Administering and Scoring tbe Tests: New York: Psychologi- 
cal Corp., 1947. 

McNemar. О. The Revision of the Stanford-Binet Scale: An Analysis 
of the Standardization Data. Boston: Houghton Mifflin, 1942. 

Terman, L. M., and Merrill, M. A. Measuring Intelligence. Boston: 
Houghton Mifflin, 1937. à 

Wechsler, D. The Measurement of Adult Intelligence (3rd edition). 
Baltimore: Williams and Wilkins, 1944. 

Wechsler, D. Wechsler Intelligence Scale for Children. Manual. New 
York: Psychological Corp., 1949. 


SUGGESTIONS FOR LABORATORY WORK 


1. Examine the Stanford-Binet items at ages 4, 8, 12, and Superior 
Adult. Classify the items at each age level as verbal, numerical, spatial- 
perceptual (for example, mazes and the like), and performance (manip- 
ulative). Add other categories if you need them. Which category has 
the largest number of items? 

2. Have members of the class pair off and test cach other. Be sure to 
follow the Manual carefully. Results from this "test" will not be indica- 
tive of mental ability, to be sure, but following the procedure is a good 
way to learn about the test. 

3. Repeat (1) and (2) for the Wechsler Intelligence Scale for Children. 
For (1), sample the items of cach test. 

4. Go over the Manual of the Arthur Point Scale. If materials are 
available, administer the Scale to a child before the class. 


QUESTIONS FOR DISCUSSION 


1. What importance do you attach to the fact that test items in 
Stanford-Binet become more "verbal" as we go up the age scale. 
2. Which test, Stanford-Binet or Arthur Point Scale, would you expect 
to prove more effective in the following situations: 
a) selecting children foi a special class for the gifted 
b).selecting children for remedial work in a "slow" elass 
c) studying children with reading problems 
d) testing children with speech defects 
3. A child taken from public school and entered in a private school 
is reported by his mother to have shown an increase iñ ІО of 20 points 
after six months in the “new” school. Assuming the story to be true, 
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what is misleading about it? What might account for the change in the 
IQ? 

4. A high school boy of 16 has a Wechsler-Bellevue IQ of 132. What 
advice would be justified by this fact alone? 

5. Look over the items in the WISC. Which do you think depend 
primarily on schooling? Do the same for the Stanford-Binet. Which test 
is the more "school centered"? 

6. Terman states that the vocabulary test gives the closest approxima- 
tion to total performance on Stanford-Binet. What does this tell us 
about the nature of the Stanford-Binet IQ? 

7. In deploring the reading interests, TV programs, and voting habits 
of the American adult, critics have said that the average mental age of 
the adult is about 14 (sometimes this is 12 or 15). What does mental age 
signify here, if anything definable? " 

8. Does a child with an IQ of 80 possess 80 per cent of normal intelli- 
gence? Explain your answer. 


o 


CHAPTER 4 


GROUP TESTS OF INTELLIGENCE 


Group and Individual Tests of Intelligence 


Group tests of intelligence are much like individual tests except 
that (1) they are administered like school examinations, and (2) 
they are objective in form—are answered by checking or circling 
2 number or letter, or by marking one of several possible re- 
sponses. Group tests contain both verbal and non-verbal ma- 
terials. Items of the first sorz are expressed in words and numbers; 
non-verbal test items, oñ the other hand, consist of" problems 
presented in pictures and diagrams. There is a minimum of 
language and little or no reading required in non-verbal items. 
Intelligence tests for pre-school and first-grade pupils are of 
necessity non-verbal, though directions are given orally. Intelli- 
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gence tests in the elementary grades contain both verbal and non- 
verbal items. At the high school and college levels, test items are 
mostly verbal, mathematical, and abstract, but even here many 
problems are presented in pictorial and spatial forms. 

Group tests of intelligence confront the examinee with tasks 
like those found in the individual intelligence scales. Both types 
of test minimize routine school learning and emphasize mental 
alertness by presenting problems which demand reasoning, gen- 
eralization, and the manipulation of “ideas.” But there are differ- 
ences, too, between the two sorts of test. In individual intelli- 
gence scales, questions are stated orally and are answered orally; 
moreover, problems are presented one at a time without time 
limit, or a generous limit is allowed. In group intelligence tests, 
questions are printed in a booklet, time limits are fixed, and 
answers are limited to the options provided. The group test is 
more dependent on reading than is the individual test, it is less 
flexible in response, and it is often disturbing to children who are 
easily flustered by a time limit. When a child's school work 
and/or the teacher's opinion of his abilities do not jibe with his 
group test score, it may be advisable to check the group test. 
result against the Stanford-Binet. Group tests, like individual 


scales, aré concerned almost entirely with the abstract level of 


intelligence (page 46). M 
The first group tests to be widely used were the two intelli- 


gence examinations developed for use in the army during World 
War I (1917-1918). Army Alpha consisted of cight sub-tests: 
Following Directions, Arithmetic Problems, Best Answers, Dis- 
arranged Sentences, Same-Opposites, Number Series Completion, 
Analogies, and Information. Army Beta made use of diagrams 
and pictures, and directions were given in pantomime. During 
World War II, the Army General, Classification Test. (AGCT ) 
was developed as а measure of general ability. Unlike Alpha, 
the items in AGCT were not grouped into sub-tests, but were 
printed in aseending order of difficulty. A civilian edition of 


AGCT is now available. 
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REPRESENTATIVE GROUP TESTS OF 
INTELLIGENCE 


This section will describe several tests of general intelligence 
covering the age range from pre-school to college. These test 
batteries have been chosen for illustration because they are well 
standardized, are widely used in the schools, and are representa- 
tive of a large assortment of group tests designed to measure 
general ability. They are not necessarily the best mental examina- 
tions for every testing situation nor for every school. Selection 
of a “best” test will depend on the objectives which the school 
hopes to achieve, the time available for testing, and the money 
and personnel which the school has available. 


GROUP TESTS OF INTELLIGENCE € 


Pintner-Cunningham Primary Test 

California Test of Mental Maturity 

Otis Quick-Scoring Mental Ability Tests 
Kuhlmann-Anderson Intelligence Tests 
Terman-McNemar Test of Mental Ability 

American Council on Education Psychological Examination 


QN ا‎ Һм ы سے‎ 


1. The Pintner-Cunningham Primary Test* 


Description. "This test includes seven non-verbal sub-tests de- 
scribed as follows: 

1. Common Observation: Child marks all of the objects in a 
given set which fit into some category. (Scc Figure 4-1, row 1.) 

2. Aesthetic Differences: The child is told to mark the 
“prettiest” (that is, best) of three drawings of the same object. 
(Figure 4-1, row 2.) ' , 

3. Associated Objects: The child marks the two objects that 
belong together in each row of pictures—as, for example, the 
hat and the coat. (Figure 4-1, row 3.) 

4. Discrimination of Size: Thc pupil is instructed to mark the 


* Published by the World Book Company, Yonkers-on-Hudson, N. Y. 
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FIGURE 4-1 Illustrative Items from the Pintner-Cunningbam 
Primary Test 


0 


Test 1. Mark the things that Mother uses when she sews her apron, 


Test 3. Mark the two things that belong together. 


eee э о © 

Test 7. Look at each picture. See how it is drawn. Make another one 
like it in the dots. 
Reproduced by permission of the World Book Company. 
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items of clothing which are of the right : size for the individual 
pictured. For cach article of clothing—shoes, hat, gloves, etc.— 
one is too large, one is too small, and ûne is of the right size. 

5. Picture Parts: In this test a series of pictures of increasing 
complexity is shown. These contain children, toys, animals, and 
other items. The same items are shown outside the "standard" 
picture, mixed in with other objects. The child is instructed to 
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mark all of the objects in this group which appeared in the 
picture. 

6. Picture Completion: In each incomplete picture, the pupil 
is asked to locate and mark the correct missing part from among 
several parts shown. 

7. Dot Drawing: The child is to copy drawings which are 
formed by joining dots. See Figure 4-1, row 4. 

All the tests are non-verbal, since most of the children for 
whom the test is intended have not learned to read. Directions 
are given orally. 


Scope. The Pintner-Cunningham Primary Test covers the 
kindergarten, Grade I, and the. first half of Grade II. There are 
three equivalent forms, A, B, and C. 


Scoring. Scores from the seven sub-tests are combined to give 
a total point score. Mental ages corresponding to point scores 
may be read from tables in the Manual. Pintner-Cunningham 
МА? are chronological ages for which the given point scores 
are typical (see page 33). These MA’s are divided by the child's 
CA. to obtain an IQ. An alternate—and better procedure—is to 
convert the point scores into deviation IQ's, following the 
method used in the Wechsler-Bellevue. The mean IQ is, of 
course, 100 and the SD is 16, equal to that of the Stanford- 
Binet. Pintner-Cunningham IQ's are not strictly equivalent to ' 
Stanford-Binet IQ’s, though the same abilities appear to be 
measured by the two scales. Correlations between the two test 
batteries run from .70 to .90 for kindergarten and primary school 
children. This indicates that Pintner-Cunningham is a valid meas- 
ure of the abstract intelligence measured by Stanford-Binet. The 
reliability or stability of Pintner-Cunningham scores is high, as 
shown by the close correspondence of one form with, another. 


2. The California Test of Mental Maturity (CTMM)* 


Description. These tests contain both verbal and non-verbal 
materials. Sub-tests are grouped under the following five heads: 


* Published by the California Test Bureau, Los Angeles, Calif. 
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memory, spatial relations, logical reasoning, numerical reasoning, 
and verbal concepts. Each of these categories is represented by 
from two to four tests. The profile in Figure 4-2 gives the names 
and classification of these sub-tests. The first three tests in each 


FIGURE 4-2 Profile for the California Test of Mental Maturity 
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California battery are designed to measure visual acuity, auditory 
acuity, and motor co-ordination. These tests, which are ina 
separate booklet, are rough screening devices intended to 
identify children too handicapped to be correctly classified by 
the test battery. 


Scope. The California test series covers the range from kinder- 

garten to college. The five batteries are as follows: 

1. Pre-primary level: kindergarten and first grade 

2. Primary level: grades 1-3 

3. Elementary level: grades 4-8 

4. Junior high level: grades 7-9 

5. Advanced level: grades 10-college and adult. 
These test batteries require about 14 hours working time. They 
are relatively easy to administer and to scorc. 


Scoring. Separate scores are obtained for each of the five areas 
(called *factors") into which the sub-tests have been grouped. 
There are also scores (and mental ages) based on (1) the Jan- 
guage or verbal tests alone, (2) the non-language tests alone, and 
(3) the test as a whole. From these three scores, separate MA's 
may be read from tables in the Manual. Language, non-language; 
and total-test IQ's are found by dividing the appropriate MA 
by the child's CA. Percentile ranks are also provided for each of 
the five “mental factors." These PR’s may also be read from 
appropriate tables. 

А special feature of the CTMM is the use of a profile or chart 
as an aid in analysis and diagnosis. As shown in Figure 4-2, the 
highs and lows of a-pupil's performance in the five areas may be 
readily seen from their positions on the profile. Along the right- 
hand margin of the chart, percentile ranks (PR’s) are entered 
for each factor, as well as for total score and for tne language 
and non-language parts of the test. These PR's give the student's 
standing on a scale of one hundred points (page 33). If the PR 
is 50, the child stands just in the middle of his age group; if his 
PR is 70, then 70 per cent of his age group fall below him m 
the given test. Р E 


> 
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The validity of the CTMM was determined through its cor- 
relations with Stanford-Binet and other standard mental tests. 
The tests appear to be very homogeneous (to measure the same 
abilities) over the age range from pre-school to college. The 
reliability of the language, non-language, and total scores is high: 
reliability coefficients of these part scores and of the factor scores 


range from .87 to .95 over grades 4-6. 


3. Otis Quick-Scoring Mental Ability Tests* 


Description. These tests differ from many group tests of in- 
telligence in that the test items are not grouped into separate 
sub-tests according to type of item. Instead, the different items— 
analogies, arithmetic problems, opposites and the like, are printed 
in a continuous repetitive pattern, so that items of a certain sort 
(opposites, for example) follow each other at stated intervals. 
This arrangement is sometimes called a “scrambled” test, or тоге . 
precisely a spiral omnibus arrangement. Items are progressively 
more difficult from the start to the finish of the test. і 

The following items are like those in the Otis Beta Test,** ‘an 
examination prepared for grades 4-9. S 

а ] "o. dar 
1. Which of the five things below is soft? (3r C9* G9 € 9*6 ) 


l.glass 2.stone 3.cottom 4.iron 5. ice 
‚Кө тү: ч чу}, 


`2. A robin is a kind of 6) G9) (93200509) 


1. plant 2. bird 3.worm 4. fish 5. flower 
15, Gli sok, ote: 


3. Hat is to head as shoe is to DEDEDE CO E 
larm 2.leg 3.foot 4. fit 5. glove 
á EI 2 S Жы 


*. North 3 б) CORO): 


l.hot Least 3.west 4.down S5. south 
i, 2” TAS 


5. At five cents each, how many pencils can be СЗ GO 192 0050 
bought for 40 cents? 
1.45 2.8 3.200 4.5 5.12 


* Published by the World Book Company, Yonkers, N. Y. - ` 
** The first two items are samples from the Beta Test. Other ites are like 


those found in the test. Я i 
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Scope. The Otis Quick-Scoring Tests cover the age range from 
Grade I through college. There are three batteries, as follows: 
Alpha Test (90 items) non-verbal; grades 1-4 
Beta Test (80 items) verbal, numerical, and spatial; grades 4-9 
Gamma Test (80 items) verbal primarily; high school and 
college 


Scoring. The Otis tests are easy to administer, and scoring is 
facilitated by a cutout stencil which can be superimposed on the 
testebooklet. The tests are virtually “self-administering.” There 
isa single time limit, which varies from twenty to thirty minutes. 
Mental age equivalents to total score are read from tables in the 
Manual. The Otis IQ’s are deviation scores and are measures 
of brightnéss. These IQ's are only generally comparable to 
Stanford-Binet IQ's; the two “scores” are not equivalent. The 
reliability of the Otis tests is high. 


4. Kuhimann-Anderson Intelligence Tests* 


Description. This is a series of thirty-nine separate sub-tests 
grouped into nine overlapping test batteries. The sub-tests include 
verbal and non-verbal materials. The early levels are entirely 
pictorial, but the tests become more verbal and abstract as we 
go up the age scale and finally are entirely verbal. Each test 
battery consists of ten sub-tests. 


Scope. Each of the ten batteries is printed in a separate booklet 
and is designed to cover one or more grade levels, as follows: 
Kindergarten: sub-tests 1—10 
Grade 1 sub-tests 4—13 
Grade 2° sub-tests 8— 17 
Grade 3 sub-tests 12—21 
Grade 4 sub-tests 15 — 24 
Grade 5 sub-tests 19 — 28 
Grade 6 sub-tests 22 — 31 


* Published by the Personnel Press, Inc., 180 Nassau Street, Princeton, N. J. 
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Grade 7-8 sub-tests 25 — 34 

Grade^9-12  sub-tests 30—39 
Administration of the K-A tests is somewhat more difficult than 
with the Otis, since the tests in the batteries are often separately 
timed. K-A requires from 30 to 45 minutes to administer. | 


Scoring. In setting up a scoring plan, the authors of K-A һаус 
employed what is called the “median mental age" method. This 
may be described briefly as follows. Each of the ten sub-tests 
in a battery yields a mental age. These MA’s (see page 32) are 
chronological ages for which a given score is typical or average. 
Thus if the children who are 10 years and 2 months old earn 
in general a score of 21 on a given sub-test, then the score of 21 
corresponds to or is equivalent to a MA of 10-2 on this sub- 
test. MA's are read from tables in the Manual. The median МА 
is the median of the ten sub-test MA’s.* This is taken to be the 
most representative measure of a child’s over-all ability. 

An IQ for the battery is found by dividing the median MA 
by the child’s life age, or CA. This IQ is not equivalent to the 
Stanford-Binet IQ, though it is related to it. The K-A tests 
measure verbal or abstract intelligence primarily, especially at 
the upper age levels. The reliability of the K-A—as shown by 
the stability of its test scores—is very high. Reliability coefficients 
have been computed for grades 1 though 9 separately: these range 


from .89 to .97. ; 


5. Terman-McNemar Test of Mental Ability** 


Description. This test is designed for high school students. It is 
a measure largely of ability te read and comprehend fairly diffi- 
cult prose. Two numerical sub-tests found in an earlicr edition 
of the test’ were eliminated in order-to render the test more 
unified in content. As it stands, we have a highly verbal battery. 


* When ten scores are arranged in order of size, the point (or score) found 


by counting off five scores from either end of the serics is the typical value ог 


median (see page 20). 
** World Book Compaay, Yonkers-on-the-Hudson, N. Y. 
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There are seven sub-tests, described as follows: information, 
synonyms, logical selection, classificaticn, analogies, opposites, 
and best answer. Sample items and instructions for each item 
type are shown in Figure 4-3. These items are easier than are 
the items found in the test proper and are for illustration. Items 
in the test are graded in difficulty from easy to hard. 


FIGURE 4-3 Sample Items from the Terman-McNemar Test 
of Mental Ability (Form C) 


5 TEST 1. INFORMATION 


B 
Mark the answer space which has the same number a» the word that makes the sentence TRUE. 


Saure. Our first President wars 
5 lAdams 2 Washington 3 Lincoln 4 Jefferson 5 Monroe... 


TEST 2. SYNONYMS 


Mark the answer space which has the same number as the word which has the SAME or most nearly 
the same meaning as the beginning word of each line. 
Saume correct—1 neat 2 fair 3 right 4 poor 


TEST 3. LOGICAL SELECTION 


Mark the answer space which has the same number as the word which tells what the thing ALWAYS 
has or ALWAYS involves. 


Saune. A cat always has А 
1 kittens 2 spots. 3 milk 4 mouse S$ hair... 


TEST 4. CLASSIFICATION 
1а each line below, four of the words belong together. Pick out the ONE WORD which does not 
belong with the others, anil mark the answer «pare bearing its number. 
ldog 2cat 3 horse 4 chicken 


5лмтиз. 
6 hop 7 ron — S stand — 9 skip -10 walk 


TEST 5. ANALOGIES 


Study the samples carefully, 
Саг is to hear as eye is to 
len 2 glasses Зару 4 wink 
Hat is to head as shoe is to ¢ 
Oam  7leg 8 1001 9f 10 glove 


DO THEM ALL LIKE THE SAMPLES 


Saures. 


‘TEST 6. OPPOSITES 


Mark the answer space which has the sime number as the word which is OPPOSITE, or most nearly 
oppesite, in meaning to the bey aning word ol each line. 


moth hot Zeast 3west 4 down 5 south 


TEST 7. BEST ANSWER 


Read each «tatement and mark the answer space which ha» the sime number as the answer which 
you think i 


Sir. А net put a burning match ia the wastebasket because 
1 Matches cost money. 2 We might need a match later. 
3 It might go out. 4 It might start s fire. | .......... 


Reproduced by permission of the World Book Company. 
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Scope. The Terman-McNemar is planned specifically for 
grades 7 through 12 and for college freshmen. 

Scoring. Total raw score is converted into a scaled score IQ, 
which is closely related to Stanford-Binet IQ. Scores may also 
be expressed as MA’s and as percentile ranks. The working time 
for the test is about 50 minutes. Terman-McNemar comes in 
two equivalent forms. In the construction of the test, a careful 
item analysis was made (page 214) in order to weed out unsatis- 
factory items. This is offered, as evidence of the test’s validity. 
The reliability of the Terman-McNemar is reported to be .96 


for a single age level. в 


6. American Council on Education 
Psychological Examination* 
Description. This battery of tests is designed to measure scho- 
lastic aptitude, er learning ability in school. It comes in two 
forms, one for high-school students and another for college 
freshmen. The college test consists of six sub-tests, as follows: 


l. Arithmetic problems: 20 problems in multiple-choice form, 
of the "mental arithmetic" variety. 

2. Completion: 30 items in multiple-choice form. The test 
demands word knowledge and definitions. 

3. Figure Analogies: 30 multiple-choice items. Analogies in- 
volve geometric forms, areas, angles, spatial arrangements. 

4. Same-Opposite: 50 multiple-choice items which demand 
vocabulary and word knowledge. 

5. Number Series: 30 items to be completed "logically" with 
appropriate numbers. 

6. Verbal Analogies: 40 items: relation-findirig in verbal terms. 


In the ACE; college form, sub-tests 1,3, and 5 are combined to 
give a quantitative, or Q, score; sub-tests 2, 4 and 6 are combined 
to give a linguistic, or L, score. Fach sub-test is separately timed 
and each is preceded by a practice exercise. In the high-school 


* Published by the Educational Testing Service, Princeton, N. J. 
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form of the ACE, tests 3 and 6 are dropped, leaving four sub- 
tests. Completion and same-opposite are combined to give the L 
score, and arithmetic and number series to give the Q score. 


Scope. The ACE is the most difficult of the general intelligence 
tests described so far. Testing time varies from about forty min- 
utes (high school form) to sixty minutes (college form). 


Scoring. The three scores from the ACE—the quantitative, the 
linguistic, and the total—may be converted into percentile ranks. 
Extensive norms (in PR’s) are published annually covering test 
results from previous years. Q scores have been found to cor- 
relate with achievement (grades) in mathematics and science, 
but the L score has the highez correlation with gencral achieve- 
ment in high school. 

The predictive validity (page 31) of the ACE, as determined 
over several years, is high, ACE correlates from .40 to .60 with 
college grades; its correlations with Stanford-Binet average about 
.65. The reliability coefficients of Q, L; and total score are all 
very high. One feature of the ACE is the publication of norms 
for different groups. Separate norms are available for boys and 
girls and for three types of college—4-year, 2-year (junior), 
and teachers’ colleges. Although the 4-year colleges achieve 
higher mean scores, there is much overlapping of 4-vear, 2-year, 
and teachers’ college scores. Differential norms are a distinct aid 
to cducational counselors. 


HOW GROUP INTELLIGENCE TESTS ARE USED 
IN THE SCHOOL 


Survey Measures 


In general, the group test of intelligence is used (1) to give 
an over-all measure of a child's abstract ability (often an IQ), 
(2) to provide a basis for educational counseling and guidance, 
and (3) to give a basis for prognosis. The total score on a group 
test is useful to the school administrator, the ciassroom teacher, 
and the parent. Standard tests supply the school administrator 
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with a systematic record of how different schools, and classes 
within a given school, compare in general ability to learn. The 
classroom teacher gets necded information concerning the abili- 
ties of individual pupils. Within a given class, the spread of 
ability is often disturbingly wide. A teacher can tell from his 
test scores whether John and Mary are doing the caliber of 
work which can reasonably be expected of them, and whether he 
is pitching his instruction at the comprehension level of the class 
as a whole. Parents can plan the future education of their chil- 
dren more intelligently when they know the level of perform- 
ance to be expected of them. And students can set their academic 
and occupational goals more realjstically when they are aware 
of their strengths and weaknesses as shown by comparison of 
their scores with norms for their age level. 


Counseling, Guidance, and Prognosis 


The total score from a group test—the IQ or other type score 
—is most useful as a measure of a pupil’s over-all academic ability. 
For guidance and counseling the teacher can use to greater ad- 
vantage the sub-tests or part scores from the test battery. The 
profile of the California Test of Mental Maturity, for instance, 
has been especially designed for diagnosis. From the language 
and non-language IQ's, a teacher can judge whether a child is 
predominantly "verbal-minded" or “object-minded ; and from 
the five “factor” scores on the profile he can judge how pro- 
ficient a pupil is in memory, logical reasoning, verbal concepts, 
spatial-perceptual relations, and numerical reasoning. In Figure 
4-2, for example, low scores in reasoning and vocabulary indi- i 
cate poor academic ability—that is, the pupil lacks the ability 
to solve problems efficiently by means of symbols (numbers 
and words). High scores in these factors reveal good academic 
aptitude and, when combined with other traits, suggest that the 
advanced, and perhaps professional, 
atial relations reveal little promise of 
and perhaps manual 


pupil is capable of more 

training. Low scores in sp s 

success in geometry, mechanical drawing, 
8 
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training. High scores here, plus high scores in the other factors, 
forecast aptitude for engineering and architecture. The memory 
factor is based on too meager a sample to provide a reliable 
measure of a pupil's functional memory; the score here might 
be significant, however, if very high or very low. 

Part scores, like those from the CTMM, are helpful in giving 
the classroom teacher clues as to a child's abilities. Вис these 
scores must always be interpreted with caution (page 68). The 
sub-tests upon which such judgments are based are quite short 
and are often too narrow to permit of a broad prediction. Marked 
differences in part scores should always be substantiated by 
further investigition; thcy should jibe with other tests, with 
grades, and with the teacher’s judgment from observation of the 
pupil's classroom work. 

The Otis Quick-Scoring Mental Ability Tests and the Kuhl- 
mann-Anderson Intelligence tests are primarily useful as over-all 
measures of the general ievel of inental functioning. The sub- 
tests of the Kuhlmann-Anderson are fairly complex. The authors, 
very wisely, do- not recommend that specific scores (mental 
ages) from sub-tests be interpreted as measuring definite psycho- 
logical functions. Wide variations in score from sub-test to sub- 
test for a given child may be significant, however, of gaps in 
training or in native ability. 

The, Pintner-Cunningham Primary Test is most useful, per- 
haps, in helping the teacher and parent decide whether a child 
is mature enough mentally to do first-grade work. Entrance into 
first grade should not depend solely upon the MA or CA, how- 
ever. Children who are babyish in their social behavior and 
poorly developed physically are poor prospects for first grade, 
no matter how high their IQ's. 

Because of its high verbal content, the Terman-iMcNemar 
Test of Mental Ability is one of the best predictors of high- 
school achievement. The homogeneity of the sub-tests (their 
high degree of relatedness) renders the test less useful for diag- 
nosis of a student’s strengths and weaknesses. The American 
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Council on Education Psychological Examination isa good pre- . 
dictor of college work. This battery measures initiative in attack- 
Ing new problems, and mental speed and facility and good work 
habits, as well as abstract ability. ACE is also useful in guidance, 
since it provides three scores—a quantitative (Q), a linguistic 
(L), and a total. The ACE for high-school students is used as 
à screening test for prospective college freshmen and as a basis 
for counselling high-school students who plan to continue their 
education beyond high school. The L score is perhaps most 
Predictive of general college work, because of the great impor- 
tance of reading comprehension in college courses. The Q score 
has predictive value for science and mathematics, especially when 
Confirmed by other indicators (grades, teachers’ judgment). 
The ACE (page 91) provides separate norms for 2-year, 
4-year and teachers’ colleges. The 4-year colleges have the 
higher average scores, but variation in score from one type 
of college to another is very Jarge, as is also variation in score 
within each college type. A student's chances of entering college 
and staying there will depend to a considerable degree upon the 
college he chooses (see page 115 for discussion of local and 
Nation-wide norms) or to which he is admitted. Only superior 
Students should be encouraged to apply to high-standard col- 
eges, and not all of these are good risks unless they have the 
Personal qualities to go along with academic potential. Good 
Personality and a capacity for hard work may not, In them- 
Selves, enhance a student's chances of being accepted into an 
A-grade college, but they will help him stay there once he is in. 
"udents with relatively low scores on ACE may be quite suc- 
Cessfu] in colleges in which the scholastic standards are not e 
'gh. In any event, knowledge of his academic strengths a 
Weaknessess should n helpful to à student, whether he plans 


"rther school work or not. 

orms for the ACE for high-school st 
5 "ted groups and may be much too high f 
"ios. In fact, his PR on the ACE may be unf 


] students are based upon 
or all high-school 
air (misleadingly 


Sele 


` 


96 Group Tests of Intelligence 


low) for the high-school senior of modest intellectual endow- 
ment who does not plan to enter college. Such a youngster may 
rank well up among 18-year-olds in the population but relatively 
low among those of college caliber. 


Limitations of the Group Intelligence Test 


Intelligence tests have definite limitations, and teachers and 
parents must not expect the impossible from them. For one 
thing, a group intelligence test cannot increase intelligence, as 
parents sometimes scem to think it should. Again, a group test 
IO is not necessarily a good measure of a pupil's drive to accom- 
plish, or of that dogged determination to stick to an unpleasant 
task and sce it through. Nor is a fairly high IQ (even a high IQ) 
always accompanied by emotional stability, good judgment,-and 
initiative. All these traits are related to good intellect, but the 
relationship is by no means perfect. Many persons of average 
intellectual ability succeed in college, whereas many of greater 
potential fall by the wayside. Intelligence is a necessary, but 
is not a sufficient, attribute for high accomplishment in school 
or in life. 


WHAT TO LOOK FOR IN A GROUP 
INTELLIGENCE TEST 
The adequacy of a group intelligence test is judged by its 
validity, reliability, scoring methods, and norms. The object in 
giving the test, its cost, and such factors as time and personnel 
must also be considered. 


Validity. A test is valid, as we have noted, if it measures what it 
purports to measure (page 30). Group intelligence tests have 
been validated, in general, against various criteria judged to be 
indicative of intellect (page 31). Some of these criteria are 
school, grades, ratings for ability, and other intelligence tests. 
All such criteria are admittedly indirect and fallible; at the 
same time, they represent measures with which any authentic 
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test of intelligence must correlate. Perhaps the best criterion of 
the validity of a group test is its success in predicting perform- 
ance in tasks judged t require intelligence—in school, in busi- 
Ness, in the armed forces, or in a profession. Judged by correla- 
tional criteria and predictive power, most of the widely used 


group intelligence tests may be accepted as valid, though never 
perfectly so. 


already had occasion to use the term 
reliability with reference to individual intelligence tests. If a 
child earns an IQ of 108 on one form of a group test and three 
months later achieves an IQ of 106, or 108, or even 110 on a 
second form—that is, scores within а few points of the first 
determination, and if most persons examined show similarly 
consistent results, we regard the test as reliable. Reliability de- 
. pends essentially upon the stability or consistency of a score. 
When properly given and scored, most standard group intelli- 


gence tests are highly reliable. 


Reliability. We have 


e first scored in arbitrary 


Scoring. Group intelligence tests ar 
correct an- 


points, one or more points being assigned to each 
Swer. Point or raw scores are frequently converted into MA’s 
and IQ's. Such IQ's are related to, but are not equivalent to, 
Stanford-Binet IQ's. Group test IQ's are adequate for screening 
and often are satisfactory for guidance; but the individual intelli- 
gence test IQ is a more searching and more nearly constant 
measure of a child's talents (page 61). In addition to MA's 
and IQ's, many group tests also provide PR's for raw or obtained 
scores, These PR's are readily interpreted: they show how high 
the pupil ranks on a scale of one hundred points. If a high-school 
senior has a PR of 85 in the (L) part of the ACE and a PR of 
80 on the (Q) part, he should be a good risk for college work. 
_ A second way of rendering the scores from different sub-tests 
nya battery comparable is through. the use of standard scores. 
Point scores mays be converted into à srandard score scale with 
a convenient mean and с. The sub-tests of a group intelligence 
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test usually differ in content, in length, and in difficulty. These 
part-scores cannot be compared—or combined—as they stand. 
But when converted into a common scale, they can be added to 
give a total in which each sub-test has the same weight. 


Norms. Norms (page 40) are typical measures of achieve- 
ment. Norms may be nation-wide or local (page 115). Local 
norms are often fairer for a given group in that they take into 
account the conditions within a given city or state. National 
norms are most useful for wide comparisons and as standards at 
which to aim. Norms for college freshmen will generally be 
much too high for high-school graduates in general. College 


FIGURE 4-4 Norms for Various Occupational Groups on tbe 
Army General Classification T'est "mi 


Civilian Occupation AGCT Standard Score 

#7 60 70 80 90 100 110 120 130 140 150 
Accountant 
Medical student 
Teacher 
Lawyer 
Bookeeper, generol 
Stenographer 
Reporter 
Clerk, general 
Purchasing agent 
Salesman 
Telephone repairman 
Artist 
Musician, instrumental 
Toolmaker 
Printer 
Machinist 
Policeman 
Sales clerk 
Electrician 
Machinist's helper 
Welder, combination 
Plumber 
Carpenter, general 
Automobile repairman 
Tractor driver 
Painter, general , 
Truck driver, heavy 
Cook 
taborer 
Barber 
Miner 
Farm worker 


Men in general 


10th Percentile 90th Pércentile 
25th Percentile. 75th Percentile 


Reproduced by permission of Harper & Brothers. 
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freshmen are a selection of high-school graduates according to 
academic proficiency. Group intelligence-test norms are usually 
given in terms of age level, but they may be in terms of grade 
level. 

Figure 4-4 shows the norms for certain occupations on the 
Army General Classification Test, used in the armed forces in 
World War II. The higher scores are achieved by men with 
the most extensive training, and are probably the resultant of 
both intelligence and training. The more intelligent men are able 
to undertake the more exacting training, and this training enables 
their native talent to express itself. It is interesting to note the 
large degree of overlapping in score from one occupation to 
another. It seems evident that many men are functioning at a 
level below their native capacity. 

The educational expectation of a child whose group test IO 
is 90, 100, or 115 may be read with sufficient accuracy for most 
purposes from Table 3-3. 

Other Factors Which May Govern the Choice of a Group Intelli- 
gence Test. In addition to the formal requirements to be met by a 
group test discussed above, there are other considerations which 
enter into the suitability of a test for a given school system. 
Among the more important are time available for testing, per- 
sonnel, cost, and acceptability. Catalogues provide data on cost 
and time allowances—most testing periods are set to fit com- 
fortably into a class period. In most cases, teachers can administer 
group tests with a minimum of instruction, and scoring can be 
done with stencils. Acceptability of a test depends on whether 
the teachers and the community look with favor upon standard 
tests. Much of the disfavor with which parents once regarded 


inately disappeared,.though one still encoun- 


mental tests has fortu ed, .th¢ : 
ters skepticism as to their value. In initiating a testing program 
/hat appear to be 


it is always wise to avoid tests which contain w 
trick items and those which resemble puzzles. Such tests are 
likely to be labeled frivolous by teachers and parents. Some 


parents still think that the object of a mental test is to describe 
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their children as dull or mentally abnormal. When they see the 
value of a standard test in providing a better understanding of a 
child's capabilities, their objections disappear. From the cata- 
logues listed on page 253, the teacher or administrator should be 
able to find the test suitable for a given situation. 


SUGGESTIONS FOR FURTHER READING 


Cronbach, L. J. Essentials of Psychological Testing. New York: 
Harper, 1949. 

Freeman, F. S. Theory and Practice of Psychological Testing (Rev. 
edition). New York: Holt, 1955. 

Goodenough, F. L. Mental Testing. New York: Rinchart, 1949. 

Noll, V. H. Introduction to Educational Measurement. Boston: Hough- 
ton Mifflin, 1957. 

Thorndike, В. L. and Hagen, E. Measurement and Evaluation in 
Psychology and Education. New York: John Wiley, 1955. 


SUGGESTIONS FOR LABORATORY WORK 


1 Administer three or four standard group tests of intelligence to the 
class and have the students score their own papers. If the test is for 
young children, cut the time limits in half. 

2. Select one of the tests taken in (1). Examine the Manual for the 
author’s treatment of validity, reliability, scoring methods, and norms. 
Summarize these data. 

3. In another of the tests from (1), count the number of items which, 
in your opinion, are verbal, numerical, and spatial-perceptual. In which 
group did you do best? Worst? Does your result jibe with what you 
know about your abilities? 


QUESTIONS FOR DISCUSSION 


1. Is a group test of intelligence anything more than a scholastic 
aptitude test? What else does it add to your knowledge of а pupil? 

2. Why is the score on a -eliable intelligence test usually a better 
estimate of a pupil's ability than is the rating of the.teacher? 

3. Why do we get different IQ's for the same pupil from different 
intelligence tests? » 

4. Is the group test of intelligence more üseful in an academic than 
in a vocational high school? 
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5. Suppose you are a sixth-grade teacher. You have administered a 
standard group-intelligence test to your class. What uses do you think 
you might make from a knowledge of these children's IQ's? 

6. À pupil has taken the CTMM (page 84). In counseling this child, 


What help might you get from a wide difference in his language and non- 
language IQ's? 


ЇЇ 


CHAPTER 5 


EDUCATIONAL ACHIEVEMENT TESTS - 


The purpose of the educational achievement test—like that of 


pupil knows about the subjects he has studied or is studying. 
Both the general intelligence test and the educational achieve- 
ment examination measure aptitude for school work ("abstract 
intelligence"). The difference between the two is one of empha- 
sis rather than of purpose. The intelligence test, as we have 
seen, tries to gauge mental alertness apart from specific school 
knowledge—that is, it is concerned primarily with the efficiency ` 
of inental processes as exhibited in problems which demand 
learning ability, perceptual keenness, memory, reasoning, and thc 
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like. The educational achievement test is also concerned with 
mental processes, but only insofar as they are demonstrated in 
2 student's performance in English composition, arithmetic, his- 
tory, or science. | 

The distinction between the two sorts of test is not always 
clean-cut, and there is much overlap in content and in abilities 
called upon. All intelligence tests depend in some degree on 
previous learning, and all educational tests depend in some part on 
native keenness. Educational achievement tests predict future 
school performance as well as or better than intelligence tests. 
Achievement in the elementary school, for example, forecasts 
achievement in high school; and performance in, arithmetic pre- 
dicts later performance in algebra? But prediction is strengthened 
when an intelligence test is added to the achievenfent battery. 
Perhaps the general intelligence test is most useful when we want 
an estimate of potential aptitude, the achievement test when we 
Want а measure of present school standing and probable success 
in later school work. Both tests provide valuable information, and 


each supplements the other. 

Educational achievement tests are useful (1) for survey pur- 
poses—that is, to determine а class’s standing in relation to some 
norm, and (2) for guidancg and evaluation—that is, to provide a 
clearer understanding of what individual pupils have learned—or 
failed to learn—in specific school subjects. A better understanding 
of strengths and weaknesses is a major objective of a testing pro- 


gram. Remedial work can be undertaken more intelligently and 


teaching improved when we know what errors а pupil is making 
ing led to 


Consistently and what misconceptions and, gaps in train 


these errors. ut 

Achievement tests are often used fer seationing pupils in s 
to improvt working conditions within.the classroom. Thus pup 
may be classified into high, average, and low ability groups on the 
basis of over-all educational, standing, ed within а 
grade into faste medium, and slow learners. «tion of later 
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school success on the basis of educational achievement tests is 
considerably more accurate than are forecests based on conven- 
tional school marks. 


THE SUPERIORITY OF STANDARD ACHIEVEMENT 
TESTS OVER ROUTINE EXAMINATIONS 


Standard achievement tests are superior to teacher-made tests 
in three principal respects. 


1. The Achievement Tesi Is Better Planned. The usual teacher- 
made test in algebra or French is composed of questions and 
problems covering topics which one teacher believes worth know- 
ing about his subject. Usually materials are drawn from a single 
textbook. Such a test is valuable as a measure of progress in Іеаги- 
ing, but it is not very broad in coverage and does not permit 
comparisons with the achievement of students in other schools. 

The standard educational achievement examination, on the 
other hand, is compiled after an analysis of many widely used 
textbooks and various courses of study and sets of examinations. 
"Thus it represents a consensus—the pooled judgment of many 
competent teachers and testing specialists. Drawing materials 
from many sources insures a representative sampling of subject 
matter. Occasionally a teacher will complain that a general 
achievement test contains questions about topics or books (in 
English literature, for example) which his class has not studied, 
and that on this account the test is unfair. This is often true, but 
the criticism is not as damaging as it may seem. Few classes have 
covered equally well ail of the topics treated in a comprehensive 
achievement test. Seme teachers will have emphasized one topic, 
some another, but by and Jarge these inequalities will even up for 
the test as a whole. Rarely will a school have a general and marked 
advantage (or disadvantage) over another schoo! in educational 
experience, unless the teaching, the curriculum, and/or the caliber 
of the students are exceptionally good or poor. When gross 
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inequalities are revealed in test scores, the reason for such differ- 
ences should be sough:. It seems hardly wise on that account to 
abandon the test. 


2. The Achievement Test Is More Objective. The standard 
achievement test is more objective than the teacher-made ex- 
amination. This means that in an achievement test, grades received 
by students depend to a minimum degree on the personal opin- 
ions, likes, and dislikes of the scorer. In the traditional essay 
examination, a high degree of subjectivity is almost inevitably 
present: the mark given an answer depends on what one teacher 
regards as important and significant. " 


3. The Achievement Test Lays Down More Exact Specifications. 
The educational achievement test is more logically planned than 
the ordinary teacher-made examination, because makers of stand- 
ard tests draw up specifications for an examination. These lists 
are often lengthy and quite specific, but in general they can be ' 
reduced to two—knowledge and application. Thus test items are 
selected to reveal a pupil’s information and understanding of 
facts, as well as his acquired skills in, for example, reading or 
arithmetic. Again, items are chosen to-reveal a pupil’s ability to 
apply known principles, to interpret, draw conclusions from 
given data, and solve problems. The second of these specifications 
is the more important, but the first is not to be dismissed lightly 
as being a matter of “mere memory.” Students cannot write 
good English prose, nor can they read difficult passages in history 
and literature, without adequate vocabulary. Even in so “logical” 
a subject as mathematics, a student cannot solve “originals” in 
geometry (no matter how bright he is), unless he knows the < 
preceding. propositions. Rote memory, of course, is rarely 
enough. The older spelling bées found how many detached and 
isolated words a child can spell—though often he had little idea 
of what the words meant. Modern spelling tests try to discover 
whether a child can spell a word and also knows its meaning well 
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enough to use it correctly in a sentence—that is, in context. The 
second method (application as well as Knowledge) provides a 
better measure of a child's usable vocabulary. 


GENERAL EDUCATIONAL ACHIEVEMENT 
BATTERIES 


The present section will describe five representative achieve- 
ment test batteries (chosen from many) which are designed to 
measure.. general educational achievement in the elementary 
grades and in the high school. 

1. The Stanford Achievement Test (SAT)* 

2. The Metropolitan Achievement Tests (MAT) 

3. The California Achievement Tests (CAT) s 
4. The Cooperative General Achievement Tests (GAT) 
5. The Sequential Tests of Educational Progress (STEP) 


All these tests make some provision for the analytic study of a 
student's strong and weak points through a comparison of sub-test 
scores. Part scores are often represented comparatively on a 
graph or profile. 


The Stanford Achievement Test (SAT)** 


Description. The SAT consists of overlapping sub-tests 
grouped at four ability levels from grade 2 through grade 9. All 
four of the batteries contain three tests of paragraph meaning, 
word meaning and spelling (these are essentially measures of 
language skills); and two tests of arithmetic reasoning and 
arithmetic computation (number or quantitative skills). All these 
tests are multiple-chcice ia form. In addition to these five sub- 
tests, the Intermediate Battery (for grades 5 and 6) and the Ad- 
vanced Battery (for grades 7, 8, and 9) include four other tests: 
language, social studies, natural science, and study skills. The 

° These batteries are often referred to in abbreviated form by the capital 


letters. 
** Published by the World Book Company, Yonkers, N. Y. 
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Language Test contains items in capitalization, punctuation, and 
sentence structure. The Social Studies Test covers fundamentals 
of history, geography, and civics. The Study Skills Test is an 
ingenious attempt to discover how well a student reads maps, 
interprets graphs and tables, and uses references. This informa- 
tion is important to the teacher, since many pupils regularly skip 
all tables and graphs unless supervised. 

"There are five forms for each battery. The Primary Battery 
is printed in a single booklet of eight pages and takes a little more 


FIGURE 5-1 Sample Items from the Stanford Acbievement 
Test, Primary Battery, Form K 
0 


Test |. Paragroph Meaning. 
^. Directions: "Find the one word thot belongs in each space, and draw a line 


under the word. Do not write in the spaces.” 


Baby pets me. 

1 drink milk. 

1 say "Mew, mew." 

1 am o 


Cow kitten. pony child 


Test IV. Arithmetic Reasoning. 
Directions: "Now look at the .oictures. Put your finger on the little chair 


in the top box. That is right. Next to the little chair are some candles. 
Put a cross on the shortest candle. Make a mark like this." (Illustrate 
on the board, making a large X). 


"P L4ll 


"Do you see the row of clocks? Fut a big cross on the clock that says it is noon." 


the World Book Company. 
\ 


Reproduced by permission of 
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than two hours to administer. Figure 5-1 shows some of the items 
from the Primary Battery. The Elementary Battery (for grades 3 
and 4) contains six sub-tests: paragraph meaning, word meaning, 
spelling, arithmetic reasoning, arithmetic computation, and lan- 
guage. The Intermediate Battery requires almost four hours and 
will, of course, have to be spread over several class periods. The 
authors of the tests have drawn up a convenient testing schedule, 
with approximate times for each sub-test. 


Scope. The scope of the SAT is as follows: 


1. Primary Rattery: end of grade 1, grade 2, and first half of 
grade 3 ] 

2. Elementary Battery: grades 3 and 4 

3. Intermediate Battery: grades 5 and 6 

4. Advanced Battery: grades 7, 8 and 9 


These four achievement tests cover the fundamentals taught in 
most schools over the elementary grades through grade 9. 


Scoring and Norms. All of the sub-tests are objective in form, 
so that scoring can be readily accomplished by stencils or scor- 
ing keys. Norms are in grade equivalents to raw scores, and also 
in percentiles for sub-test scores. 

There are two types of norms. The first, called the modal-age 
grade norm, is recommended for individual diagnosis, that is, 
for evaluating the scores of individual pupils. From tables in the 
Manual, a pupil's scores can be compared with those earned by 
children who are typical for age and grade. A second norm, the 
total-group grade norm, is based upon the performance of all 
children іп a given grade. These norms, given in tables in the , 
Manual, are recommended by the authors when one wishes to 
evaluate a class average. Raw scores on the sub-tests are con- 
verted into standard score units so that they may be combined 
and compared (page 38). 

The validity of the SAT is high. The tests possess content 
validity and the correlations of the batteries with grades and 


s 
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other criteria demonstrate excellent predictive validity. The re- 
liability of the various batteries is also satisfactory. 


The Metropolitan Achievement Tests (MAT)* 


Description. The MAT includes five test batteries with a range 
from grade 1 through the first half of grade 9. All of the test 
batteries contain sub-tests of reading and arithmetic; spelling is 
added after grade 2, and language usage after grade 3. At the 
intermediate and advanced levels there are ten sub-tests in all: 
reading, vocabulary, arithmetic fundamentals, arithmetic prob- 
lems, English, literature, social studies (history), social studies 
(geography), science, and spelling. In addition to the complete 
batteries, partial test batteries are available for use’ at the inter- 
mediate and advanced levels. These include the skill subjects— 
reading, arithmetic, English, and spelling—plus vocabulary and 
arithmetic problems. All tests at a given level are printed in a 
single booklet. 

The MAT provides a comprehensive survey of a pupil’s educa- 
tional attainment. Moreover, the profile chart (sce Figure 5-2) 
printed on the last page of the test booklet and the class ability 
sheet allow the teacher to identify the student’s weak points, to 
correct errors consistently made, to study a pupil's rate of 
progress from time to time, and to group pupils for instruction 
or review. Tests in arithmetic and reading are available as sep- 
arates and may be used when it is not feasible to administer the 
whole battery. 

Scope. MAT includes the following batteries: 

Primary Battery I: grade 1 and beginning grade 2 
Primary Battery II: grade 2 and beginning grade 3 
Elementary Battery: grades 3 and 4 and beginning grade 5 
Intermediate Battery: grade 5 up to the first half of grade 7 

5. Advanced Battery: grade 7 up to the first half of grade 9 
MAT covers и wide range of material taught in grades 1-9. Test 


* Published by the World Book Company, Yonkers, М.Ү. 


رم س ج 


FIGURE 5-2 Profile Chart for the Metropolitan Achievement 
Tests 


Teacher. Miss, Swift. . Grade... Ca 
City Richmond 
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batteries require from one hour (primary) to about four hours 
(advanced). s 

Scoring and Norms. 'The MAT is easy to administer and to 
score. There are three types of norms: age, grade, and percentile. 
Norms are given also in a standard score scale which is based 
on the assumption of a normal distribution of test ability in the 
sixth grade. Standard or scaled scores are comparable from bat- 
tery to battery in the same subject, but not from test to test within 
a given battery. Figure 5-2 shows the profile of George Fergu- 
son, who is 11 years and 8 months old. George is a sixth-grade 
student, and the MAT was administered on February 6 when 
he was midway through the grade (that is, at 6.5). George's 
scores on the ten sub-tests have been convertéd into age- 
equivalents from the appropriate tables in the “Key and Direc- 
tions for Scoring." His subject ages (also called educational ages 
or EA's) have been entered on the chart and joined by short 
straight lines to give the profile of his school achievement. A 
straight line drawn horizontally across the chart through George's 
chronological age of 11-8 shows immediately in what subjects 
he is above or below the scores typical for his age level. 

George's raw scores Were converted into age- instead of 
grade-equivalents. These EA's show whether George is acceler- 
ated or retarded as compared with children of his own age. EA's 
are useful in guidance. Grade equivalents give the grade levels to 
which various scores correspond. A profile plotted from grade- 
equivalents tells us whether a pupil is above or below his present 
grade level in his various subjects. Both norms are useful. Grade 
norms are especially useful when comparisons with national or 
local norms are to be made; age norms, are most useful when: 
diagnosis;of a pupil's strengths and weaknesses 15 wanted. 

Both the validity and the reliability of the MAT are satisfactory 


as judged by the usual criteria. 


The California Achievement Tests (CAT)* y ; 
Description. The CAT have been organized into five batteries 


* Published by the California Test Bureau, Los Angeles, Calif. 
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designed to cover the ability range from grade 1 to college. The 
tests are survey in nature and are concemed primarily with skills 
in six areas: reading vocabulary, reading comprehension, arith- 
metic reasoning, arithmetic fondamentals, mechanics of English 
and spelling. The authors of CAT believe that tests in these areas 
are more valuable than are tests in such subjects as social studies, 
where the content varies widely from school to school. The 
California Tests emphasize power rather than speed, the time 
required for the Elementary Battery being more than two hours. 
CAT stresses the use of the separate tests in diagnosis. Except 
in the case of spelling, for example, the tests in the six areas are 
subdivided into sections, each dealing with some important aspect 
of the subject. For example, in the Elementary Battery, reading 
comprehension (Test 2) is analyzed into (1) following direc- 
tions, (2) reference skills, and (3) interpretation of material. 
Test 3, arithmetic reasoning, is broken down into (1) meanings, 
(2) signs and symbols, and (3) problems. Scores from each of , 
these sub-divisions are plotted on a profile like that of Figure 5-2, 
usually in grade- equivalent units. The analysis of a pupil’s per- 
formance is carried still further by a second grouping together 
of items which presumably measure essentially common "Inne- 
tions. Thus within the division of punctuation under Test 5, 
mechanics of English, items are grouped into those which in- 
volve commas, periods, question marks, quotation marks. Under 
the heading of addition in Test 4, arithmetic fundamentals, items 
are grouped under zeros, carrying, fractions, and decimals. Fhe 
nuniber of item classifications under a given test varies from 50 
to more than 100. A, special chart enables the scorer to analyze 
the pupil’s achievement over a wide range of these elements. 
Careful examinaticn of specific item- -groups may, to be sure, 
reveal why a pupil fails consistently to use decimais correctly 
or to understand fractions; or it may tell us where he is weak in 
punctuation, or in vocabulary, or in spelling. The CAT at least 
makes an attempt to keep the individual pupil from being lost in 
an “average.” At the same time, it must be remembered that in- 
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dividual diagnosis based on a few items is always tentative and 
may be misleading. ° 


Scope. The CAT consists of the following batteries, which 
cover the educational levels described. 


1. Lower Primary: grades 1 and 2 
2. Upper Primary: grades 3 and 4 
3. Elementary: grades 4, 5, and 6 
4. Junior High: grades 7, 8, and 9 
5. Advanced: grades 9 to 14 


Scoring. Raw scores on the tests may be converted by tables 
into age, grade, and percentile-w-thin-grade norms. "The sub-tests 
are objective in form, easy to administer, and easy co score. The 
six tests of the batteries have satisfactory reliability, but the re- 
liabilities of the various sub-divisions are quite low because of the 
few items included in some groupings (often only one or two). 
Validity is high for the whole test. 


Co-operative General Achievement Tests (GAT)* 


Description. These achievement tests deal with three fields or 
areas— Test I covers social studies, Test II, natural sciences, and 
Test III, mathematics. Each test battery consists of two divisions: 
Part I, which deals with fundamental terms, concepts and defini- 
tions; and Part II, which covers applications of knowledge, in- 
terpretation, and comprehension. The battery has been planned 
for grades 10, 11, and 12, but it is probably too difficult for all 
but superior tenth and eleventh graders. The battery is objective 


in form throughout. 

Scope. GAT is a power tes 
grades айа for college freshme 
60 minutes. 

Scoring. The tests are all multiple- 


administer and to score. Items are grap 
* Published by the Educational Testing Service, Princeton, N. J. 
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t designed for the upper school 
n. Fach test requires ftom 40 to 


choice, and ares easy to 
hic, pictorial, and verbal. 
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Norms in scaled scores and percentiles are given for high-school 
students and college freshmen. GAT is probably most useful in 
the counseling of high-school students as to the subject fields in 
which they show the greatest promise. 


The Sequential Tests of Educational Progress (STEP)* 


Description. As the term “sequential” implies, this battery is 
designed to measure a student’s progress in learning as he goes 
from the elementary grades to college. The tests deal with 
critical skills in seven academic areas: ‘essay tests, planned to 
provide standardized tests in writing prose; listening compre- 
hension tests, in which the exeminer reads a passage and asks ques- 
tions designed to call out comprehension, interpretation, and 
evaluation; reading tests, covering a wide range of content; w cit- 
ing tests, planned to measure the student’s ability to express ideas; 
mathematics tests, which contain items over a wide range of 
subject matter and difficulty; science tests, dealing with the appli- 
cation of scientific knowledge to a variety of situations; and 
social studies tests, designed to show progress in social and civic 


- development. 


Scope. STEP is designed to measure achievement over the 
following levels: 


Level 1—freshmen and sophomore years of college 
Level 2—grades 10, 11, and 12 

Level 3—grades 7, 8, and 9 

Level 4—grades 4, 5, and 6 


It should be noted that Level 1 is the highest level academically, 
Level 4 the lowest. STEP attempts to reveal continuity in mental 
growth and learning from the bottom to the top level. 


Scoring and Norms. There are two equivalent forms (A and B) 
of each test in STEP except the essay tests, for which there are 
four forms. There are grade and percentile norms. Scoring is by 
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stencils. A profile chart allows the examiner to analyze a pupil's 
performances on the several functions measured by the battery. 


GENERAL ACHIEVEMENT TESTS 
IN THE SCHOOLS 


We have seen how the general educational achievement test 
gives the academic level of a pupil or.of a class, and how the test 
profile reveals strengths and weaknesses in a variety of subjects 
and processes. Further illustration of how educational achieve- 
ment tests may be utilized in (1) evaluation, (2) diagnosis, and 
(3) prediction will be given in this section. $ 


Evaluation. Suppose that Miss Clark has given the SAT to her 
sixth-grade class of twenty-six pupils. She finds her class mean 
(average) on the test battery to be about equal to the local norms 
for the sixth grade, but slightly below the national norm as given 
in the Manual. Does this result mean that Miss Clark is doing a ` 
poor job because local norms are less valuable than national? The 
answer is No, since a number of factors affect achievement in a 
given school system or a single school, and some of these may 
cause local norms to be lower or higher than national. Among 


these factors are the following: 


1. Retardation as a consequence of strict promotional stand- 
ards and practices. Much retardation will lower local norms, 
whereas the weeding out of poor students (by transfer to 
special classes, for example) will raise local norms. — 

2. Promotion by age irrespective of achievement. This fairly 
common practice will lead to a progressive lowering of . 
local grade norms. e E^ 

3. Previous experience of pupils with standard objective tests. 
This factor varies widely and often affects local norms. 

4. Coaching in the tests themselves. Sometimes teachers coach 
pupils in materials akin to or identical with those found in 
the tests. “Teaching for the tests” is bad practice and should 
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be discouraged whenever possible. Coached pupils usually 
raise the class's performance. 0 

5. Selection. Children from a poor socio-economic back- 
ground generally score lower on standard tests, whereas 
children from good neighborhoods score higher, especially 
on the verbal tests. 

6. Motivation. Children do not try hard on tests if the teacher’s 
attitude is negative, or if the parents think achievement 
tests are worthless—and say so loudly and often. 

7. Transfers, drop-outs. These children may affect local 
norms, usually adversely. 


In some private schools in which pupils are generally of high 
caliber because of stringent selection procedures, local norms will 
often be found to be considerably above national norms based 
on public school results. In a large city system we can expect an 
occasional sixth-grade class to fall below national norms even 
when the city as a whole is up to national standards. But when 
a number of classes fall below national standards, the curriculum, 
the teaching methods, the promotional standards and other con- 

‚ ditions in the school and the community should be examined. 


Diagnosis. In looking over her test results for the sixth grade, 
Miss Clark may find that Harry is far below the sixth-grade norm 
in reading and that Sue is below the norm in arithmetic. At the 

| same time, Mary reads at eighth-grade level, and John (the 
youngest child in the class) is up to the ninth-grade norm in 
science. Individual differences like these are the rule rather than 
the exception in most elementary classes. It is fairly easy for Miss 
Clark to prescribe further reading for Mary, and to stimulate John 
to carry out an individval project in science—for example, classi- 
fying the birds in the loca] community. The below-average chil- 
dren often present real problems, and as a result they are given 
more of the teacher's time and effort than the bright children. 
If the extra time which Miss Clark can devote to Harry and Sue 
is insufficient to bring these children up to the sixth-grade levels 


4 
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in reading and arithmetic, they should be referred to special 
classes, if such are available. The larger the number of below- 
average children, the more difficult is Miss Clark’s task, and the 
more likely she is to neglect the bright children. 

It should be noted, as a further point, that the printed norm 
(local or national) does not necessarily establish the optimum 
level of performance for every pupil in the sixth (or any other) 
grade. If Norman, whose IQ is 120, is just on the sixth-grade 
norm in reading and arithmetic, he is not performing up to ex- 
pectation—his scores should be above the norm for his grade. On 
the other hand, if Bill, whose IQ is a modest 94, is at or above 
the norm for the sixth grade in reading and arithmetic, he is 
actually doing better than we can reasonably expect of him. The 
intelligence of the child must always be considered in deciding 
whether his school work is “normal” for the grade. 

Sometimes Miss Clark will suspect from a pupil’s sullen be- 
havior, or open aggressiveness, or his tendency to whimper at 
the slightest provocation that emotional factors are causing or 
contributing to his difficulties in school. Such a pupil should be 
referred to the school psychologist (if there is one) or to the 
school physician. The clinical psychologist is often able through 
tests and interviews to get a:clearer idea of a pupil's difficulties 
than can the teacher. The teacher should visit a child’s home if 
she suspects that parents and home environment are involved, as 
they often arc. Corrective measures (when possible) can be more 
intelligently applied when causal factors making for undesirable 
conduct and/or poor school work are known, rather than sur- 
mised from superficial impressions. 

Prediction. Whether it will be profitable for,a student to take 
science or mathematics in high school бг college can be forecast 
with considerable assurance from his.performance on standard 
tests. Prediction of later success is usually improved when tests 
given in elementary schools are combined with a good intelli- 
gence test. Inteiligence and achievement tests are regularly 
utilized in many schools in the selection and placement of stu- 
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dents in courses of study. The combination of achievement tests 
and special aptitude tests is valuable in predicting a student's 
success in a professional school—in law or medicine, for example. 


ACHIEVEMENT TESTS IN SPECIAL 
SUBJECT AREAS 


In the preceding section, we described five general achieve- 
ment batteries designed to assess academic standing in school. In 
the present section, we shall consider several representative sub- 
ject-matter achievement tests. These include tests of reading and 
arithmetic, as well as tests planned to determine mental maturity 
(readiness) and proficiency in special subjects. Of the various 
subject-metter tests, those in reading and arithmetic are most 
often given, since they represent fundamental skills upon which 
school achievement largely depends. Subject-matter tests are 
found, of course, in the general achievement batteries, as well as 
in separate form. The tests listed below were selected as being 
typical of a very large number available. 


Metropolitan Readiness Tests 

Towa Silent Reading Tests 
Co-operative Mathematical Tests 
Evaluation and Adjustment Series 

. Co-operative French Test (elementary) 
Co-operative Science Test 


aya eno 


Metropolitan Readiness Tests* 


Description. 'The nrimary objective of these tests is to find 
whether a child is sufficiently mature to undertake the study of 
reading. But the ‘tests are concerned also with "readiness" for 
arithmetic, and with gcneral physical and mental m.turity. The 
six tests in the battery may be described as follows: , 

(1) Word Meaning: child selects picture named by the ex- 

aminer. ` 
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(2) Sentences: same as (1) except that the examiner uses 
sentences and phrases instead of single words. 

(3) Information: child marks the picture corresponding to the 
examiner's oral description. 

(4) Matching: child must recognize similarities and differences 
in pictures, geometrical forms, numbers, letters, words. 

(5) Numbers: child must demonstrate a knowledge of number 
concepts and carry out simple operations. 

(6) Copying: child is required to copy simple graphic forms, 
as well as numbers and letters. 


All the test items are pictorial—that is, non-verbal. The test 
has two forms. The battery is essentially a prognostic test: its 
purpose is to forecast a child’s mental, sensory-motor, and mus- 
cular readiness for first-grade work. Figure 5-3 shows sample 


items. 


Scope. The test is for the end of kindergarten and the begin- 
ning of first grade. The test requires about sixty minutes working 


time. 


Scoring. Norms in percentile ranks allow the teacher to estimate 
a pupil’s readiness for reading (based on tests 1-4), readiness for 
arithmetic (test 5), and general maturity for first-grade work 
(tests 1-6). In addition, a child’s score is given a rating from A to 
E. An A rating denotes an excellent risk, the other letters a lesser 
degree of certainty down to E, which implies almost certain 
failure. 

Prognostic Value of the Metropolitan Readiness Tests. The test. 
battery as a whole forecasts general maturity for the first grade, 
but its sub-tests may be used diagnostjcally, t6 provide informa- 
tion about andividual children. If Ben makes low scores on-tests 1, 
2, 3, and perhaps 4, for example, he has inadequate шшш їп 
language for first-grade work. Or he has too little тшше 
with and comprehension of language generally. If Louise earns 
low scores on tests 4 and 6, she is probably too immature to under- 
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FIGURE 5-3 


Sample Items from Metropolitan Readiness Tests 


Hm 


Test 1. Word Meaning. In the first row, the child marks the baby; 


in the second row, the house. 
2 
4S 


БА 


Test 4. Matching. In each row the child cir 
to the one in the circular frame, 


Z 


о 


c 


E 


cles the picture identical 
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take written work. As these two tests measure visual perception 
and hand-eye co-ordination, an eye examination and training in 


motor skills may be indicated. Test 5 (numbers) shows readiness 
for number work, and the child who scores high should be able 


at this age level. If a child 


by age 7%, he should be 


nd perhaps a PSychologist. 
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lowa Silent Reading Tests* 


Description. This test Consists of two batteries, one for elemen- 
tary schools and one for high schools and colleges. Both batteries 
measure reading rate, vocabulary, sentence comprehension, 
paragraph reading, and skill in locating information. Speed is an 
element in the battery, as well as power. The Elementary Test 
includes a reading comprehension test called "directed reading," 
and the Advanced Test a test of poetry comprehension. 


Scope. The two batteries cover the following range: 


Elementary Test (four forms)—grades 4-8 — , 
Advanced Test (four forms)—high school and college fresh- 


men А 
Working time for either battery is about 50 minutes. 
Scoring. There are six sub-tests in the Elementary Test: 


Rate and comprehension in reading connected prose. 
Directed reading of prose to get answers. 

Vocabulary and work meaning. 

Paragraph reading: selecting the main idea and adding 


appropriate details. 
5. Sentence meaning: understanding brief sentences out of 


context. 
6. Work-study skills: alphabetizing and using an index. 


Test 1 yields two scores (rate and comprehension) and Test 6 
two scores (alphabetizing and use of index). Tests 2, 3, 4, and 5 
yield one score each. These 8 sub-scores are converted into scaled 
Scores by means of tables appended to each test. Scaled scores 
may be plotted on a profile to show the variations in perform- 
ance, Percefitile norms are also provided by grade for each sub- 
test and for total score. There are age and grade equivalents to 


total score. , 
"The Iowa Test сап be expected to spot (a) the extremely slow 


mon‏ ت هھ 
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reader, (b) the careless reader who fails to follow directions, 
omits necessary details, and skims over important facts, and (c) 
the rapid but uncomprehending reader. 


Co-operative Mathematics Test for Grades 7, 8, and 9* 


Description. This test consists of four parts: I, skills; II, facts, 
terms, and concepts; III, applications; IV, appreciation. Ques- 
tions and problems cover basic arithmetic as well as simple algebra 
and geometry. The test may be used for survey purposes, but it 


is perhaps more valuable in evaluation and guidance. Sample items 
from the test are shown in Figure 5-4. 


Еуаісс Гоп and Adjustm :nt Series (High School)** 

Description. This is an exte: 
other tests (twenty- 
for use in high sch 
25 algebra, biology, 
addition, there are t 
democracy," 
the tests has 
study, 


sive battery of subject-matter and 
four so far and more to be added) designed 
ools. The tests cover such traditional areas 
geometry, physics, history, and literature. In 
ests of reading comprehension, “problems in 
health knowledge, and study skills. The content of 
been drawn from standard textbooks, courses of 


and professional literature. Tests may be administered as 
Separates or as parts of a general survey. 


Scope. For survey and diagnosis in grades 9 through 12. There 
are two forms for most tests. 


, Scoring. Raw scores are converted into scaled 
Test, so that comparisons may be made from test 
may also be compared graphically by means of 
of the tests provide charts showing what score j 
at given IQ levels. IO's аге from the Terman- 
Mental Ability. The reliability of the various 
is satisfactory. The separate tests require fro 
hour of working time. : 


"cores for each 
to test. Results 
a profile. Many 
S to be expected 
McNemar Test of 
tests in the battery 
m 45 minutes to an 
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FIGURE 5-4 Sample Multiple-Cboice Items from tbe Coopera- 
- tive Mathematics Test for Grades 7, 8, and 9 


From Part I, Skills: 
39. «16 equals 


1 
39-1 & 
392 2 
39-3 8 
30-4 4 


D PII S T 


From Port ll, Facts, Terms, and Concepts: 


7. Which of the following is a unit in the^ 
metric system? ә 

7-1 Ounce 

7-2 Centimeter 

7-3 Yard / 

7-4 Bushel 

VES Giro. erg e. cade ра ТО) 


From Part Ill, Applications: 
24. Ifaman spends 12% of his salary on bonds, 
and buys a $37.50 bond cach month, what 
is his monthly salary? 
24-1 $312.50 
24-2 $312.60 
24-3 $350 . 
24-4 $376.20 
24-5 gi. eee ыза ө коз * 20 ) 


From Port IV, Appreciation: 


20. Which of the following has no volume? 


20-1 Cylinder 
Cone 


Cup 
20-5 Rectangular box - +--+" °° 
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Co-operative French Test (Elementary)* 


The specifications for this test call for knowledge 


Description. 
plus the ability to use the 


of French grammar and vocabulary, 
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language in reading and translation. The test has three parts: 
vocabulary, grammar, and reading. The vocabulary section is a 
multiple-choice test of fifty words. Grammar (thirty-five T 
requires the selection of one of five choices to complete correctly 


the translation of an English sentence into French. In the reading 


section, forty incomplete sentences in French аге to be completed 
from a list of five options. The reliability of the test is high. 


FIGURE 5.5 
tive 


Sample Multiple-Cboice Items from tbe Coopera- 
Science Test for Grades 7, 8, and 9 


From Part 1, Informational Background: 
LLES e 


3. It is beli 
their stru 


Суса that dinosaurs 
ggle for existence chicfly because 
3-1 they were killed by man for food. 
3-2 man could not tame them, 
3-3 they were not adapted to changes 
that took place in the carth's sur- 
face and climate. 

3-4 they were not fitted to cat plant food. 
$-5 they had no brains. 


BE зын A dece 30) 


lost out in 


From Part 11, Terms and Concepts; 


2. The instrument used to look at and study 
the surface of the moon and the planets is 
the 


galvanoscope, 
microscope. 
2-3 telescope. 

2-4 electroscope. 
2-5 radiometer, , . 


Mec ا‎ 2 
13. If two plants of the same Species but of 
ifferent varieties are mated, the offspring 
are called 
13-1 mongrels, 
13-2 sports. 
13-3 biennials, 
x 13-4 Jentils. 
13-5 hybrids. 


Part Ill, Comprehension and Interpretation; 
This test consists of multiple-choice ў 


а paragraph of scientific prose o 
be understood and interpreted. 
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Scope. This test is intended for the first two years of high 
school or for the first year of college study of French. 


Scoring. Scaled scores are provided for each of the three parts 
of the test and for the total. There are percentile rorms for high- 
school and for college classes. Working time for the test is forty 


minutes. 


Co-operative Science Test (Grades 7, 8, and 9)* 


Description. There are three parts to this test: Part I, informa- 
tion and background; Part II, terms and concepts; Part III, com- 
prehension and interpretation. The test is planned to measure 
knowledge and application. Part II is in multiple-choice form. 
Part III consists of readings in science, each reading followed by 
questions designed to assess the student’s understanding, as well 
as his ability to interpret and apply what he has read. (Figure 5-5) 


Scope. Grade 9 and superior seventh and eighth graders. 


Scoring. There are scaled scores for the three parts and for the 
total. Percentile norms are given for grades 7, 8, and 9, The 
working time for the whole test is about eighty minutes. Re- 
liability of the whole test is high. 


WHAT TO LOOK FOR IN AN EDUCATIONAL 
ACHIEVEMENT TEST 


The suitability of an educational achievement test for a given 
situation must be determined from an examination of its validity, 
its reliability, its scaling techniques, and'its norms. The cost, 
time, and personnel needed to administer and score the tests must 
also be considered. These same requirements apply to group tests 
of intelligence. Each of the main characteristics of a mental test, 
except perhaps validity, has been commented on at appropriate 
places throughout this chapter. A summary of the relevant data 
under each category will now be offered. 
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Validity. An educational achievement test is valid when, it 
measures what it undertakes to measure Most subject-matter 
tests possess content validity. An arithmetic test or a geography 
test or a reading test, for example, is valid by definition when at 
Contains a sampling of arithmetic problems, geography questions, 
and paragraphs to be read. The standardized educational test is 
made up of items taken from a variety. ofsources: widely used 
textbooks, courses of study, examination questions, and outlines. 
The items in tentative form are checked by experienced teachers 
and are put into Objective form by test construction specialists. А 


broad selection of items insures a comprehensive sampling of 
materials. 


One v'ation technique employed in some educational tests 


is the following. The test is provisionally drawn up and is admin- 
istered to an experimental group; only those items are retained 
which show an increasing percentage passing with age or with 
grade. Other techniques of item analysis will be described. m 
Chapter 9. All of these procedures are directed toward selecting 
questions which will work together as a team, cover a wide г апре 
of difficulty, and be related closely in content (be homogencous). 
The standard test, when finally made, is a compact and closely 
knit instrument for measuring what it purports to measure. Data 


on validation procedures will be found in most Manuals which 
accompany standardized achievement tests, 


Scores “stay put.” 


much fluctuation to 
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expect in a child's score upon retest. If the standard error is three 
points, for example, the odds are two to one that Bill's score of 64 
will, on a second trial on the test, vary up ог down from the first 
determination by not more than three points. The smaller the SE 
of a test score, the greater the stability of the obtained score. The 
SE of a test score gives us more information concerning reliability 
than does the reliability coefficient alone (page 29). 

Scores obtained on most standard tests are highly stable, but 
part scores based on a relatively few items are variable and may 
be quite unreliable. Conclusions as to strengths and weaknesses 
based on unstable scores are always tentative, and must be re- 


garded as suggestive only. ° 


Scaling. Most educational achievement tests are first scored in 


arbitrarily assigned points, so many points being given for a 
correct answer. These point scores are usually converted into 
scaled scores by means of tables printed at the end of the sub- 
test. The meaning of standard scores and of T-scores has been 
discussed in Chapter 2. Raw or obtained scores (point scores) 
from the sub-tests of a battery differ in length, difficulty, and 
content; they cannot be compared or combined as they stand. 
When scaled, scores expressed in different units are comparable. 


Scaled scores—and sometimes raw scores—are usually converted 


into age and/or grade equivalents—into the age and grade values 


which correspond on an average to the given scores. 1 the 
average child of 9 years and 4 months earns a score of 38 on an 
Arithmetic Fundamentals Test, then the score of 38 "equals". an 
educational age (EA) of 9-4. If children who are half way 
through the seventh grade (that is, at 7.5) earn a mean score of 
63 on a Reading Test, the score of 63 has 2. grade equivalent of 


ne ^ ТЕ: 
The educational age (EA) may be divided by the chronological 


n educational quotient (EQ). (zo =) 


n and is sor ewhat analogous 


3 


age (CA) to give a 
This EQ is a measure of acceleratio: 
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to the IQ. The EA and EQ are often useful, provided they are 
taken to refer on 


Ту to the tests on whick they are based and are 
not thought of as general indices. 


Norms. Norms are typical measures of performance. In d 
standard educational test, the mean score made by a large Hn 
representative group of fifth-grade pupils is the norm for fi a 
grade children on this test, Norms are expressed in age and зен 
equivalents, as percentile ranks, and in the form of scaled scor of 
A child's grade placement is found by computing the tenths 1 
the school year which have passed before the test was € E 
the school year begins about September 1 and ends June 15, " 
sixth-grade class tested in the period between March 16 and Ар А 
15 is assigned the grade position of 6.7—the class is 7/10 into t B 
School year. Most standard educational achievement tests repo : 
nation-wide norms in their Manuals. These typical performance 
are based on the achievements of large groups of children t 
all over the country. As we have pointed out, local norms ( ча 
city or state or both) are often better measures of pupil aama 
ment. Any pupil’s scores relative to those of other pupils shou d 
be evaluated in terms of his effort, his intelligence, and his hom 
and community, 

Other Faciors in the Selection о 


gram, the personnel required, and the time it will take from other 
5 oolactivities- all these i 


test or tests. Tests which © а class period, which can 
be scored objectively (by means of 
which are acceptabie in form 


SUGGESTIONS.FOR FURTHER READING 
А i, A. Psychological Test 2 b 
Ссс, H. A. jorge, ie "NS New York: Macmillan, 1954. 
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v cca A. M. Measurement in Education. New York: McGraw-Hill, 
. Traxler, A. E. et al. Introduction to Testing and the Use of Test Results 
in Public Schools. New York: Harper, 1953. 


SUGGESTIONS FOR LABORATORY WORK 


1. Administer two or three standardized achievement tests to the 
class, cutting the time to one-half if necessary. Have students score 
their own tests and plot profiles where called for: 

2. Analyze a standard reading test, listing the objectives which you 
think the author had in mind. Do you agree that these objectives were 


fulfilled? 
3. Select a test taken in 


2 (1). Consult the Manual for data on validity, 
reliability, scaling procedures, and norms. Я 


YI» 


o 


QUESTIONS FOR DISCUSSION 


1. For which of the following purposes would a standardi 


ment test be useful: 
(1) To discover which pupils have 
division of fractions. 
(2) To determine which pupils are reading too slowly. 
(3) To determine for the class which punctuation skills need further 


work. 
(4) To section the cl 
(5) To discover the subjects i 
which weak. 
2. A teacher lists the following as objectives of a course in history and 
Civics: 
(1) To present facts inthe field. — Š . 
(2) To prepare the class for the duties of citizenship. 
(3) To further appreciation of democracy. © 
(4) To foster criticism of governmental processes. 
(5) To aid pupils in thinking about problems in government. 
Which of these objectives is the teacher most,likely to fulfill? — , 
3. The Manual of Test ABC states that the test may be used for 
diagnostic purposes- What do you look for in a test to determine whether 


it has diagnostic value? 
4. A. professor of English states that batteries of standard tests tell the ` 


English teacher nothing that cannot be better found out from a theme 
and an interview. Do you agree? : 


ized achieve- 


not mastered multiplication and 


ass into two groups for teaching arithmetic. 
n which each pupil is strong and in 


a 
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5. Why is it necessary 
nosis have high reliability? А 
6. In some schools, teachers prepare for a testing program by having 
students review older standard examinations. What effect could this 
have on the students! morale? 


On the comparability of test results front 
school to school? Is it good educational practice? . | 
7. In School A, the pupils in grades 4 to 7 are given the California 
Achievement Tests. Scores are 


recorded in grade equivalents only. What 
other types of scores would be valuable? Why? 


8. The Manual of a reading test reports а correlation of 40 with 
English marks in the first year of high school. Is this good evidence of 
validity? Discuss. 

9. Suppose that the Metropolitan Achievement Tests have been admin- 
istered in grade 5 in Octob 


er. How might you, as the teacher, use the 
results of the test? 


10. For what predictive Purposes would it be desirable to have the 
results from the following tests: 


(1) A test of abili 
various fields. 

(2) A test of skill in g 
Structure, and so on. 


11. How could you use the results from a group intelligence test to 
supplement scores made by your pupils on an achievement battery? 
12. Is it important to have tests of speed, as well as of power? 


that sub-tests in a battery to be used for diag- 
4 


ty to read difficult scientific prose drawn from 


rammar: punctuation, capitalization, sentence 


€—— 


CHAPTER 6 


APTITUDE TESTS 


_ When a youngster possesses traits and abilities which enable 
him to speak French readily, acquire mathematics, deal handily 
with tools, or play 2 musical instrument well, he is said to have 
aptitude for the given activity. Aptitudes are probably inherited 
basically, but they cannot appear unless the environment is 
favorable—that is; unless the opportunity is provided. Very often 
some training; often a great deal of it, 45 necessary, too, before an 
aptitude reveals itself in performance. 

Aptitude tests are not essentially different in form or in con- 
tent from intelligence and educational achievement tests, sincê 


all mental tests are in reality measures of aptitude. Intelligence 
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tests measure capacity for school work and for vocations requir- 
ing school training; and achievement tests measure proficiency 
in English grammar, mathematics, science, and other subjects. 
Perhaps the chief difference between these tests and those de- 
signed to measure aptitudes is the fact that an aptitude test 1s 
concerned almost entirely with the future—with prognosis. Thus 
an engineering aptitude test is used typically to forecast an ex- 
aminee’s chances of success in engineering. The aptitude test 
alone is, of course, rarely able to provide a wholly satisfactory 
estimate of probable performance later on. For an individual’s 
efforts to be maximally effective, aptitude must be supple- 
mented by training. Furthermore, the examinee must possess 


initiative, interest in the job, and favorable personality charac- 
teristics. 

We have classified aptitude tests under four heads: (1) general, 
(2) special, (3) professional, and (4) talent. The two best-known 
general aptitude batteries are those designed to assess aptitude 
for (a) mechanical tasks, and (b) for clerical work. Many special 
tests (of speed, co-ordination and reaction time) have been de- 
vised to measure aptitudes believed to be crucial in industry. 
Achievement tests, too, are employed as aptitude tests to reveal 
an examinee’s performance in languages or mathematics, for 
example, and hence provide a measure of his promise in more ad- 
vanced courses. In the field of professional work, aptitude test 
batteries have been assembled to assess the traits believed nec- 
essary for success in medical school, in law, in engineering 
and in teaching. Aptitude in music and art is generally called 
talent, and tests are available to forecast achievement in these 


fields. 


GENERAL APTITUDE BATTERIES 


The general aptitude battery attempts to forecast probable 
success in a number of related tasks or vocations by sampling Е 
wide range of behaviors believed to be involved in the activity. 
In this section, two batteries designed to Measure aptitude for 


\ Ч 
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mechanical work are described, together with two batteries 
planned to measure aptitude for clerical proficiency. 


Mechanical Aptitude 


_ The term “mechanical aptitude” includes a variety of behav- 
iors. One of the earliest mechanical aptitude tests consisted of 
а box containing a number of common gadgets in separate com- 
partments. Each of these contrivances (a lock, door bell, clothes 
pin, and so on) was to be assembled with the aid of simple tools. 
The score was determined by the speed and accuracy of assem- 
bly. This kind of test is often described as a “job sample” or 
vocational miniature,” since it involves what has to be done on 
a small scale. Among the sub-tests in paper-and-pencil "batteries 
deVised to measure mechanical aptitude are (1) tests requiring 
motor speed and dexterity of movement, (2) tests of the ability 
to visualize or perceive mechanical and spatial relations. (im- 
portant in reading blueprints and in architectural drawings); (3) 
tests of mechanical information concerning tools, machines, and 
the construction and use of various contrivances; and (4) tests 
of mechanical reasoning as demonstrated in the ability to solve 
problems dealing with tools, pulleys, levers, machine parts, and 
the like, In addition, in assessing mechanical aptitude, inventories 
are used which are designed to reveal interest in mechanical 
things. Such interest may be shown, for example, when a boy 
has his own tools, tinkers with 


reads Popular Science avidly, 
One of the most useful find- 


radios, and builds space machines. 
be B 5 7, 7 
Ings to come out of the testing program 1n World War II was 


the discovery that paper-and-pencil tests of mechanical aptitude 
are as predictive of success in 1nany mechanical jobs as are actual 


job samples covering the work. ^ | - 
The following two test batteries arerepresentative of the best 


tests in this field: 
MacQuarrie Test of Mechanical Ability 
Bennett Mechanical Comprehension Test 
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MacQuarrie Test of Mechanical Ability* 


Description. This battery consists of? seven paper-and-pencil 
tests, as follows: 


1. Tracing: following a narrow path. 

Tapping: making dots rapidly. 

Dotting: placing dots precisely. 

Copying: making a figure from co-ordinates. 
Location: locating items by co-ordinates. 

Block Counting: counting hidden blocks in a stack. 
Pursuit: tracing a line through a tangled pattern. 


mS 


Sample items irom the MacQuarrie tests are shown in Figure 6-1. 

All these tests are relatively simple and all are speeded: testing 
times are short. The MacQuarrie tests are designed to measure 
hand-eye co-ordination, finger movement and speed, manual dex- 
terity, visual acuity, and spatial perception of direction and size. 
"Taken as a whole, the MacQuarrie battery measures motor dex- 
terity as a fairly low level of difficulty rather than aptitude for 
engineering or for architecture. For the latter, the Bennett Test 
of mechanical comprehension is recommended. Some of the 
MacQuarrie sub-tests are predictive of special tasks: the tests 1n 
tracing, dotting and pursuit, for axample, measure aptitude for 
typing; and tests of block counting, tracing, pursuit, location, and 
copying are related to performance in mechanical drawing and 
the reading of blueprints. The Manual w 
MacQuarrie advises the use of sub 
Success in various jobs. 


hich accompanies the 
-test patterns for predicting 


Scope. The MacGuarrie test can be administered from grade 7 
on. It has been employed chiefly in the ү 


i prediction of success in 
factory and other manual-manipulative work. 


Scoring. Percentile norins are available for the sub-tests and for 
total score. The working time for the whole test is about twenty 
- minutes. Since some of the tests in the battery are allotted only 


* Published by the California Test Bureau, Los Angeles, Calif 
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FIGURE 6-1 Sample Items from the MacQuarrie Test of 
ə Mechanical Ability 


START 


Dolling: Place a dot in each circle os rapidly o» possible. 


Blocks: How many blocks touch each 


ining dots. 
block with on X on it? 


Copying: Copy figure by io 


E 61| 
BO e DI 
ОИ 


and show where it ends, by 
box at the right. 
he California Test Burcau. 


ch line by eye 
in the correct 


Pursuit: Follow eo 
writing ils number 


Reproduced by permission of t 


onds, a stop watch is needed in order to time 
ly. The reliability of the whole test is high. 
ub-tests are lower, but are fairly satis- 


ten to twenty Sec 
the tests accurate 
Reliabilities of the seven 5 
factory for such short tests. 


‹ \ 


a 


Mechanical Comprehension Test* 
on. This is a paper-and-pencil test in which compre- 
1 relations is determined by means of pic- 


Bennett 


Descripti 
hension of mechanica 


e Published by The Psychological Corporation, New York, N. Y. 


136 Aptitude Tests 


tures and sketches. The test is fairly advanced in difficulty. Each 
picture or drawing has a simply phrased question designed to. 
reveal the examiice’s understanding of the mechanical problem 
presented. Figure 6-2 shows samples from the test battery. 
Scope. There are four form 
easiest, is suitable for tr. 
trained workers. Form 


s of the Bennett test. Form AA, the 
adc and high schools and for less well 
BB, more difficult, is for engineering 


FIGURE 6-2 Samples from Bennett Mechanical Comprehension 


Test 


Which room has more of an echo? 


Which would be better shears 
for cutting metal? 


Which gear turns slower? 


Which cart is more 


likely to ti 
over on the hillside? A a 


Reproduced by permission of The Psychologica] Corporatio, 
n. 
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School applicants, technicians, and engineers. Form CC, the most 
difficult, differentiates among examinees of high ability levels. 
The fourth form, WI, is for women. 


Scoring. Percentile norms, which are supplied for each test 
form, are applicable to a variety of student and occupational 
groups. The test is valuable in guidance, in selecting applicants 
with aptitude for mechanical thinking, and in the selection of 
students wanting to study mechanics and engineering. The Mac- 
Quarrie is a useful supplement to the Bennett test when speed 
and manual dexterity are required as well as more abstract think- 
ing about mechanical relations. „ ү 

The reliability of the Bennett is satisfactory. Validity.is hard 
to. determine, but the test is valid in relation to such criteria as 
grades in high-school shop courses and occupational and in- 


dustrial performance. 


Clerical Aptitude 


Tests planned to gauge clerical aptitude are concerned mainly 
with perceptual speed and accuracy in reading, writing, and 
and with manual dexterity and skill. Office workers 
are designated in several ways, such as general clerk, sales clerk, 
shipping clerk, filing clerk, typist, and receptionist. The jobs 
differ in the kind and variety of their duties, but all demand (to 
a greater or lesser extent) reading, writing, sorting, checking, 
filing, folding, sealing, and stamping. ° ; 

The present section will describe two tests of clerical apti- 
tude, the first fairly narrow in functions covered, the second 


much broader. 


marking, 


Minnesota Clerical Test 

General Clerical Test 

Minnesota Clerical Test* ` 
Description. This battery covers speed and accuracy in per- 
* Published by The Psychological Corporation, New York, N. Y. 
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ceiving clerical detail. There are two parts, number comparison 
and name comparison. In the first, the examinee is shown two 
hundred pairs of numbers each containing from 3 to 12 digits. 
If the two numbers are alike, the examinee places a check (V) 
between them; if they are unlike, he leaves the space blank. In the 
second test, proper names (which match or fail to match) are 
substituted for number pairs. Samples are shown below: 
79542 79524 


5794367... — 5794367 
John C. Linder John C. Lender 


Investors’ Syndicate Investors' Syndicate 


The Minnesota Clerical Test is n 
the factors which make for 
does attempt to 


ot designed to encompass all 
proficiency in office work, but it 
predict ability to handle addresses, bills, accounts, ' 
and so on. The Minnesota test has been found to have prognostic 


value in the selection of clerks, packers, checkers, inspectors (of 
products), and other factory jobs. 


Scope. This clerical test may be used with students from junior 


high school on and for adults. 


Scoring. The working time of the test is about fifteen minutes, 
so that both speed and accuracy enter into a score. Individual 


differences appear in the scores and must be taken into account 
in interpreting the test. A ve 


may make few errors but ea 
slowness and over- 
careless worker m 


ty careful examinee, for example, 
ma relatively low score because of 


cautiousness. On the other hand, a fast but 


г may mark more items but tend to make many 
errors. Percentile norms are available fo 


E : г boys and girls, junior 
and senior high-school students, and several groups of industrial 
workers. Among the latter there are norms for women who arc 
machine operators, typists and Clerks; for men who are tellers 
(bark), accountants and various sorts of 


1 clerks. A high score 
earned by a student does not necessarily méan that this examinee 
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will make a good абай worker, though it is а decidedly good 
omen. On the other hand, a high-school counselor would cer- 
tainly be wise to question the vocational promise of a com- 
mercial and business student who scored below the twenty-fifth 
percentile of clerical workers. The reliability of the test is high. 


General Clerical Test (GCT)* 


Description. This test battery is designed to measure three kinds 
of aptitude judged to be valuable in office work. There are nine 
sub-tests in the battery. Parts I and II test clerical speed and 
accuracy; Parts III, IV, and V numerical ability; Parts VI, VII, 
VIII and IX verbal facility. The first two (checking and alpha- 
betizing) measure perceptual speed and accuracy as expréssed in 
such activities as sorting, coding, and alphabetizing. The next 
three measure numerical aptitude as shown in computation, error 
location and arithmetical reasoning. The last four measure verbal _ 
facility by means of spelling, reading, comprehension, vocabu- 
lary, and grammar. The over-all score is a good measure of 
abstract intelligence, as well as of aptitude for clerical work. 
The test is to be recommended, therefore, for clerical jobs which 


demand a relatively high level of intelligence. 


Scope. The battery is intended for use with high-school and 
business school students. The GCT may also be valuable when 
testing applicants for more responsible clerical positions. The 


working time for the test is about fifty minutes. 
are available fer high schools and for 


for various sorts of clerical workers. 
well as for total score, are provided. 


The reliability of the whole test is high—greater than .90. The 
reliability of the sub-tests is much lower, and the counselor must 
be tentative in judgments based upon parts of the test. 


* Published by The Psychological Corporation, New York, N. Y. 
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business schools, as well as 
Norms for each sub-test, as 
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APTITUDE TESTS IN SPECIAL ARFAS* 


In this section five test batteries often useful to the educational 
counselor and classroom teacher will be described. These spe- 
cialized examinations are illustrative of many tests in this field: 


Differential Aptitude Tests 

Minnesota Paper Form Board 

Murphy-Durrell Diagnostic Reading Readiness Test 
Orleans Algebra Prognosis Test 

Turse Short-Hand Aptitude Test 


Differential Aptitude Test (DAT)** 


Description. This battery is designed for educational and voca- 
onal guidance of high-school students, There are seven sub- 
tests, each of which yields a separate score: 


ti 


Verbal reasoning: A difficult verbal analogies test, which measures ability 


to handle verbal relations. Aspirants for professions should earn high 
scores, 


Numerical ability: An arithmetic test covering a wide range of opera- 
tions. This test is an important predictor in science and engineering. 


Abstract reasoning: A non-language test which demands the solution of 


problems expressed in diagrams and figures. The test measures a high 
level of abstract intelligence. 


Space relations: Ability to perceive a three-dimensional object from a 


two-dimensional pattern. Useful in engineering, architecture, and 
drafting. 


Clerical speed and accuracy: A test of speed and accuracy in the per- 
formance of clerical tasks. Speed is an important factor. 
Mechanical reasoning: A form of the Bennett Mechanical Comprehension 


* Under aptitude tests are often listed sensory-motor tests of visual and audi- 
tory keenness, as well as special tests of motof skills, dexterity and co-ordination. 
Apparatus tests of this sort are valuable in industry and the military service, 
but they аге not used routinely in the schools and will not be described here, 
Some of the devices are very coniplex and Tequire specialized training on the 
part of the examiner. Oral Trade Tests constitute another sort of specialized 
aptitude test which will not be treated here. These tests are really oral inter- 
views, are administered individually, and are valuable in appraising the voca- 
tional training and work experience of an applicant. 

** Published by The Psychological Corporation, New York, N. Y. 
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Test. Useful as a predictor of engineering aptitude when combined 
with the first four testnabove. 

Language usage: Two tests scored separately which measure the ability 
to spell and to locate errors in sentences. Emphasizes the mechanics 
of language as compared with test #1, which emphasizes abstract 


comprehension. 


FIGURE 6-3 Sample Items from the Differential Aptitude Tests 


MECHANICAL REASONING 
Which man in t 


his picture has the heavier load? 


one of the five combinations is underlined. 


In each test item, 
on the answer sheet end mark it. 


Find the same combination 


LANGUAGE USAGE: / Spelling 
Indicate whether each word is spelled right or wrong. 
EXAMPLES SAMPLE OF ANSWER SHEET 3 
w. man 
x gurl 


LANGUAGE USAGE: 1 Sentences 2 
Decide which of the lettered parts of each sentence contains errors, 


у, mark the corresponding letters on the answer sheet. 


if an ч 
Ain't we / going to the / office / next week / at all. 
B c D E 


A 
Reproduced by permission of The Psychological Corporation. 
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The illustrative items in 
sub-tests, 


‹ 

A main feature of the DAT is that the total score is broken 
down into several components, so that fri 
we have a record of comparative performance in eight funda- 
mental activities. The Manual gives explicit instructions for ad- 
ministering and scoring the test battery. In addition, a Casebook 


illustrates the use of the profile in diagnosis, and will be helpful 
to guidance counselors, 


Figure 6-3 show the nature of the 


om a student’s profile 


Scope. For grade 8 and for high-school grades 9-12. 


Scoring. Percentile norms are 


Supplied for grades 8 through 12 
for total score and for 5 


scores on each sub-test, Since there are 
large sex differences, percentile norms are given for boys and 
girls separately. Scaled scores (with a mean set at 50) are em- 
ployed in plotting the profiles, Figure 6-4 shows the profile of 
a boy who could profit from educational counseling. Note that 
James is high in the space and mechanical tests, but mediocre to 
low in all the others. The boy is certainly not "verbally minded," 
although he appears to have real talent in mechanics, The teacher 
will understand James better if he has his profile available. 

The DAT represents the modern “practice of substituting a 
number of analytic scores (for example, on a profile) for a single 
over-all score. We have noted (page 113) that diagnosis of 
strong and weak points from short sub-tests is always precarious 
because of their low reliability. The reliability of the total DAT 
is very high, and the authors have increased the value of a diag- 
nosis from the sub-tests. by computing the minimal difference 
between sub-test scores which will be Significant, that is, non- 
chance. This makes it possible to say, for instance, that Roy's 
Score in abstract reasoning is signific han his score in 
clerical speed and accuracy, or 
reasoning and numerical ability significantly. 

Despite its general excellence, the DAT has some practical 
drawbacks to its use in school. 


5. For one thing, the battery is 
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FIGURE 6-4 Profile of a High-School Boy on the Differential 
» Aptitude Tests 


INDIVIDUAL 
REPORT 
FORM 


DIFFERENTIAL APTITUDE TESTS 
G. K. Bensen, Н. G. Sesshore, and А. G. Woman 


THE PSYCHOLOGICAL CORPORATION 
New York 18, № Y. 


пама 


JWE Wrucakmm 


PLACE OF TESTING 


Boy 553 5328 


wvy 529593323 


Reproduced by permission of The Psychological Corporation. 
. 


long (working time approximately three hours) and the cost 
relatively high. Good norms are ‘available for the high-school 
grades (boys and girls taken separately), but there are relatively 
few data on, occupational and vocational groups. The battery 
appears to have content validity, and various experimental studies 
indicate that it possesses empirical validity. For example, workers 
in the electrical, mechanical, and building trades score above, 
average on mechanical reasoning, and clerks are about average 
in numerical ability and in clerical speed and accuracy and lan- 
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guage usage. Engineering students score very high on all the sub- 
tests except the clerical tests, but are above the mean here. Men 
in the skilled trades (baker, butcher) are average in mechanical 
reasoning, and low in numerical ability, abstract reasoning, and 
space relations. Pre-medical students score high on all sub-tests, 
and especially high in verbal reasoning, numerical ability, and 
sentences. In the high school, verbal reasoning and sentences are 
predictive.of grades in English; numerical ability, verbal reason- 
ing, and abstract reasoning show substantial correlations with 
mathematics and science. Unfortunately, the data do not reveal 
bow successful a man is likely to be over a period of time in 


a profession, occupation, or trade. But the tests often provide 
significant clues. 


Minnesota Paper Form Board (MPFB)* 


Description. This is a well-known paper-and-pencil test dealing 
with spatial relations. It represents an effort to put a formboard 
on paper. Sample items are shown in Figure 6-5. 

Each test item presents a geometrical figure cut into two or 
more parts. The examinee is to decide how the parts would look if 
fitted into a complete figure; he does this by selecting the draw- 


ing which shows the correct arrangement. Studies have shown 


the Minnesota Paper Form Board to be a good index of ability 
to perceive spatial relations and to manipulate figures in two 
dimensions. The test is useful as an aid in predicting success in 
shop work, grades in technical courses, in dentistry, art work, 
and shop and factory output. It does not tap the more intellectual 
aspects of engineering—for instance, the ability to use symbols 
in solving problems. But it does tes: one component in engineer- 
ing skill. A boy scoring high in the МРЕВ is Not necessarily apt 
in engineering, dentistry, cr art, but he has Promise and is worth 
further examination. On the other hand, a boy who scores low 
had best be encouraged to try some other kind of work. As 


often happens, we can give negative educational and vocational 
advice with far greatcr assurance than we can offer a positive 


* Published by The Psychological Corporation, New York, N.Y, 
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FIGURE 6-5 Sample Items from the Minnesota Paper Form 
Board Test 


` 


e which would result if the 


Directions: For each item choose the figur 
pieces in the first section were assembled. 


Reproduced by permission of The Psychological Corporation. 


оше of action. Thus, we can tell a. youngster that he had 
etter not attempt engineering, but we cannot always offer him 
Specific advice as to just what he sbould do. 


Scope. Grade 7 and above. 
ool grades and for various 


Scoring. Norms are available for sch 
Occupational groups. There are two forms of the MPFB, and 
d score. Counselors have found 


the test is easy to administer an 

the test useful as a supplement to verbal intelligence and achicve- 

ment.tests, especially for students planning; to study architecture, 

engineering, commercial art, and other vocations requiring spatial 
s 


perception and visualization. . 


a 


© 

Murphy-Durrell Diagnostic Reading Readiness Test* 
Description. This test has been designed to measure three, char- 

acteristics believed to be important in the acquisition of reading 

skills: auditory discrimination, visual discrimination, and learn- 


* Published by the World Book Company, Yonkers, N. Y. 
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ing rate. Like other readiness tests it is prognostic, in that it 
forecasts whether or not a child is ready to begin reading. It is 
also an achievement test and could be so classified, as it measures 
the educational maturity of a youngster. The Metropolitan Read- 
iness Tests (page 118) may, in turn, be classified as aptitude tests 
rather than as achievement tests. 

The Murphy-Durrell test provides useful information for the 
first-grade teacher in deciding when to start а formal reading 
program and what outcomes to expect. At the same time, a good 


intelligence test will be useful in estimating general mental 
maturity. 


Scope. Early in the First Grade or before. 


Scoring. Test 1 and Test 2 (auditory discrimination and visual 
discrimination) require about an hour each. Test 3 (learning) is 
both an individual and a group test; there are twenty minutes for 
group instruction and three brief individual periods. Obtained 


raw scores are converted into percentile norms for Tests 1 and 2; 
ratings are used in Test 3. 


Orleans Algebra Prognosis Test (Rev.)* 


Description. This is a prognostic test, the purpose of which is to 
determine whether a pupil is likely to succeed in (is ready for) 
algebra. The test is administered before the pupil undertakes thc 
study of algebra. There are nine parts, consisting of simple 
lessens covering some aspect of algebra—for example, use of 
symbols, substitution.in equations, literal nomenclature, and 
solving of problems, followed by tests on the material presented. 
An arithmetic test and a summary test of the material are in- 
cluded."The test has been shown to have good prognustic value, 
as indicated by its correlations with algebra grades and achieve- 
ment test scores in algebra. Ч 


) Scope. For students planning to study algebra, 


* Published by the World Book Company, Yonkers N, y. 
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т панаа The test yequires about forty-five to fifty minutes. 
ШЕ, percentile norms corresponding to point scores. Fur- 
ee oe ere are expectancy charts predicting how well a child 

g a certain score can be expected to do in algebra. The 


reliability of the test is satisfactory. 


Turse Shorthand Aptitude Test* 


f Description. This test is illustrative of aptitude tests developed 
or use with commercial and vocational subjects. The purpose of 
the test is to determine whether an examinee is likely to be 
Successful in learning shorthand. {here are seven sub-tests: strok- 
ing, spelling, phonetic association, symbol transcription, word 
discrimination, dictation, and word sense. 


Scope. For students planning to study shorthand. 


ntile norms for students beginning the 
rrelated with achieve- 


gnosis, The working 


Scoring. There are perce 
study of shorthand. The Turse test is co: 
ment in shorthand and is valuable in pro 
time of the test is about an hour. 


APTITUDE TESTS FOR THE PROFESSIONS 

Tests of aptitude for the professions are primarily achievement 
tests designed to forecast a student’s chances of success in train- 
ing for medicine, law, ОГ engineering. These tests are specialized 
in content and are essentially work samples in the designated 
field. Professional aptitude batteries are validated against grades 
in courses. It is not known precisely just how predictive these 
tests are of success in. the actual practice cf a profession, but 
hat such aptitude tests are related—some- 


there is same evidence t 


times highly related—to later success. y | | 
Тһе classroom teacher should be familiar with the general 


content and purpose of the professional aptitude tests, though 
he will rarely be called on to administer or score them. These 
* Published by the Werld Book Company, Yonkers, N. Y. 


148 Aptitude Tests 


batteries are not generally available, are often part of a testing 
program, are highly specialized, and are usually scored and 
interpreted in a testing center. We shall, accordingly, give a less 
detailed description of them. 


Medical College Admission Tesi* 


Description. This test consists of four parts: verbal, quantitative, 
understanding modern society, and science. The verbal section 
includes tests of vocabulary, and. reading comprehension tests 
in science, socia, studies, and the humanities. The quantitative 
part requires that the examinee solve problems making use of 
numbers and symbols. The "understanding society" section is a 
multiple-choice examination covering current social, economic; 
and political affairs. The science part of the test contains ques- 
tions drawn from pre-medical courses in biology, chemistry, and 


physics. Samples from the various sections reveal the character 
of the examination. 


Verbal section 


sporadic: (A) immediate, (B) regular, (C) occasional, (D) alter- 
nate. (E) replete 


Quantitative part 


12. One-fifth of a batch of 2000 radio tubes were defective. If one- 


fourth of the first 1000 were defectiye, what fraction of the 
second 1000 were defective? 


+ (A) 1/20 (B) 1/10 (C) 3/20 (D) 9/40 (E) 3/10 


Understanding society ~ 

18. Which of the following was the primary Objective of the 
nations which signed the North Atlantic Pact? 
(A) To form an alliance for military conquest, 
(B) To insure economic stability in democratic States 
(C) To replace the Marshall Plan with a new Alliance: 
RD» barca effectiveness of the Soviet veto in the 
(E) To unite for collective defense, i 


à 
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Science 
21. A sodium atom and a sodium ion 
(A) contain the same number of electrons 
(B) contain the same number of protons 
(C) have the same chemical properties 
(D) have the same physical properties 
(E) have different atomic numbers ; 


The first three parts of the battery are related directly to 
Standing in medical school. The “understanding society” section 

IS not related to medical knowledge, but is included in an 
attempt to select candidates for medicine who will be successful 

In adapting to the needs of the time. In their instructions to candi- 
dates, the authors write that “the tést is intended to complement 

„ Other data (your total college record, interviews, references and 
TeCommendations) with an objective inventory of your skills, 
Concepts and information . . . acquired from formal study and 


from experience."* 


Law School Admission Test** 
Description. This battery is designed for use in selecting the 
best candidates from among those applying for law school. The 
attery has six parts: principles and cases, data interpretation, 
Teading comprehension, debates (the „examinee determines 
Whether a statement supports, refutes, or is irrelevant to a given 
resolution), best arguments, and paragraph reading. Some of the 
material is difficult. The test battery has a córrelation of about .50 
With law school grades. When combined with college marks, 


It is highly predictive of success in law school. 


Pre-Engineering Ability Test** 
Description. This test consists of two sorts of material: (a) 
comprehension of scientific materials, and (b) general mathe- 


* Medical College Admission Test, Bulletin of Information, Educational 


i ice, Princeton, N. J., 1957, p. 22. | 
** Püblished by die Educational Testing Service, Princeton, №. J. 
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matical problems designed to measure competence in this ares, 
The first part of the test involves reading scientific prose, tables 
and graphs, and answering questions based upon these materials. 
The second part consists of problems in arithmetic, algebra, and 
geometry. The Pre-Engineering Test correlates about .50 with 


grades in the first term of engineering school. The reliability 
of the battery is high. 


National Teacher Examination 
Description. These examinations are 
Systems as an aid in the select 
ployed also by teacher- 
their students, The ex 
and scored by the Ed 
is the measurement o 


planned for use by school 
ion of teachers, and they are ema 
training colleges as a means of evaluating 
aminations are constructed, administered, 
ucational Testing Service. Their objeczive 

f professional background, general intelli- 
gence, and general culture. There are two parts of the battery, 
four common examinations, and a series of optional examinations. 
The first set covers a student’s general background for teaching; 
and the second his mastery of some special field. 


The common examinations comprise the following sub-tests: 


Professional information: Child dev 


elopment, educational psychology: 
guidance, measurement, 


principles and methods of teaching. 
General culture: Sections on science and mathematics and on literature; 


history and the fine arts. Examinations cover the development and 
current state of affairs in these ficlds. 

English expression: Grammatical errors to be detected in sentences. 

Non-verbal reasoning: A pattern completion test in which the examinee 
must fathom the relationships in a given figure and choose the correct 
figure to complete the pattern. 

The optional examinations cover eight areas of specialization: 
education in elementary schools, early child education, biological 
sciences, English, industrial arts, : В 
og ж: E inr mon examinations have exhibited 
suustantial relationships with ra gs for effectiveness of teachers 
by supervisors. The tests do n 
factors, interest, or drive. 
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TESTS OF ARTISTIC APTITUDE OR TALENT 


Tests in this area are concerned with finding whether an 
examinee possesses some of the factors which appear to be neces- 
rt. So many traits contribute to 


sary for success in music or in а 
the success of an artist or a musician that it is impossible for an 


aptitude test to do more than tap some of the more obvious com- 
ponents. Perhaps the best the aptitude tests can do in many in- 
stances is to aid the counselor in steering away. from the arts those 
aspiring students who have no real talent and whose money and 


time might be better spent in other pursuits. 
It is doubtful whether the classroom teacher will have the 
time, the training, ОГ the equipment needed to administer an 
interpret the aptitude tests in this area. Teachers engáged in 


guidance should be familiar with such tests, however—with what 
they are and what they are trying to do. Two tests of music 
and one of art will be described in this section. They are repre- 


sentative of aptitude measures in this field. 
Musical Talents 


Seashore Measures of 
hievement in Music 


Diagnostic Tests of Ac 
Meier Art Judgment Test 


easures of Musical Talenis* 


Description. This is a test of “ear for music.” The test battery 
consists of six separate tests covering such attributes of tone as 
pitch discrimination, loudness, rhythm, time, timbre, and tonal 
memory. Lhe tests are given by means of phono 
Each test item ОГ problem presents 2 pair of ton 
sequence. In the second playing, one of the, tones 
or the sequence of tones is altered in some way- In 
discrimination test, the examinee marks on a test sheet whether 
the second tone is higher (H) or lower (L) than the first. Com- 
parisons become progressively more difficult as the difference 
in pitch between the two tones decreases. In the time and loud- 

e published by The Psychological Corporation, 


Seashore M 


es ога tonal 
is changed, 


New York, ЇЧ. Y. 


amm 


graph records. 


the pitch i 
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ness tests, the second, or comparison, tone differs in strength 
or in pattern from the first, The rhythm test requires the exam- 
inee to decide whether the second of two patterns is alike or 
different from the first. The timbre and tonal memory tests 
differ somewhat from the others. In the first, two tonal patterns 
are compared for quality (consonance); in the second, a short 
series of from three to five tones is played, and then played a 
Second ‘time with one note changed. The subject must write 
down the number of the altered note. The stimuli (tones) pre- 


sented by the phonograph records are as pure (uncomplicated) 
as possible. 


Scope. The Seashore Tests are applicable from the fifth grade 
on. 

Scoring. Scores from the six sub-tests are plotted on a profile 
to give a graphic representation of performance. Percentile norms 
are available for fifth and sixth graders, seventh and eighth grad- 
ers, and adults. The Seashore Tests have been used in schools of 
music and in music courses in academic schools. The tests ad- 
mittedly do not run the gamut of musical talent, but they do 
measure important aspects of musical aptitude. A child who ranks 
low on these tests has a poor ear for music and is a doubtful 


selection for extensive musical training. The reliability of the 
battery runs about .80. 


Diagnostic Tests of Achievement in Music* 


Description. As the name imp 
to find how well students hav 
nical knowledge reeded to г 


lies, this test battery is designed 
€ acquired the theory and tech- 


ead and understand music. The 
test consists of ten parts: diatonic Syllable names, chromatic 


syllable names, number names, time signatures, major and minor 
keys, note and rest values, letter names, signs and symbols, key 
names: and song recognition. Test content is based on materials 


* Published by the California Test Bureau, Los Angeles, Calif, 


‘the two versions of the picture 
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recommended by musical authorities as fundamental in musical 
education. A piano is required for the tests. 


Scope. For grades 4 through 12. Test items are graded up 
sharply in difficulty. 


Scoring. Norms for the test are based on the degree of mastery 
shown by students for the various sorts of material. Strengths and 
weaknesses are revealed by comparison of scores upon the ten 
parts of the test. The reliability of the whole test over the rather 
wide range for which it is applicable is very high. Reliability for 
separate grades is lower, but probably satisfactory. Working 
time for the test is about sixty minutes. The Diagnostic, Tests 
аге a useful supplement to “ear” tests like the Seashore. An ear 
for music is necessary for any musical activity, whereas a knowl- 
edge of the technical aspects of music is necessary for one 


aspiring to be a musician. 


Meier Art Judgment Test* 


Description. This test consists of a hundred problems in each of 
which an artistic judgment is demanded. Each test item is pre- 
sented in two versions. In the first version, there is a painting or 
drawing by some well-known artist, or an acknowledged artistic 
design; in the second, the same theme is presented but in altered 
form, the change being in symmetry, balance, unity, or rhythm. 
All pictures are in black and white, so that no complication is 
introduced (nor any clues) by color. The examinee is told that 
differ and is asked to select the 
better version. The test is, accordingly, a measure of aesthetic 
judgment, the criterion being the consensus of experts in art. 


See Figure 6-6 (facing page 150). 
Scope. The Meier test is intended for junior and senior. high 
schools, as well as colleges and art schools. 


* Published by the Bureau of Educational Research and Service, University of 


Iowa, Iowa City, Jowa. 
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Scoring. Norms are for students in art 4 
not necessarily mean that the student is destined to be an artist. 
But a low score should be a warning signal to one planning 4 
career in art. The Meier test has correlations of from about .45 
to .50 with grades in art courses, 
on verbal intelligence tests. This 
unintelligent, but that many facto 
enter into artistic appreciation. 
about .75 for fairly homogeneou 


courses. High scores do 


but low correlation with scores 
does not mean that artists are 
rs besides abstract ability must 
The reliability of the test is 
S groups. 


HOW TO JUDGE. AN APTITUDE TEST 


' . 
Like tests of intelligence and of educational achievement, apti- 
tude tests must be ; 


judged by the adequacy of their validation, 
reliability, Scaling. and norms. Various comments concerning 
these aspects of the tests described in this chapter have been made 


1n appropriate places. This and other material will now be 
summarized. 


Validity. Aptitude tests generally possess content validity. Tests 
of speed, dexterity, seeing mechanical relations, solving mechdhi- 
cal problems, and the like seem proper for measuring mechanical 
aptitude. Moreover, tests of sorting, writing, reading, and alpha- 
betizing appear to be appropriate for assessing clerical ability. 
In the tests of professional aptitude and in those of talent, the 
content has been chosen with a view toward forecasting per- 
formance in school and (hopefully) in life, 

Aptitude tests have been valid 
cluding grades in courses and su 
measures of practical or working validity have 
for the Minnesota Paper Form Board 
DAT. s 


Reliability. The reliabili 
‘generally satisfactory, and we can have c 
of a score. In some cases, the standard е 
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Scaling. In most aptitude tests, raw scores are converted into 
percentile ranks. In some tests (the DAT, for example) scaled 
Scores are used on the profile in making comparisons of a given 
student’s scores. 


Norms. All aptitude tests have norms either in percentiles or in 
scaled scores. A few, tests give norms for certain occupational 
groups. One drawback to the use of vocational aptitude tests, 
however, is the lack of adequate norms in many job areas. There 
is a need for information regarding the predictive value of pro- 
fessional and vocational tests for persons long out of school. 
It would be a great advance if we knew how well an aptitude 
‘test cduld forescast the success of engineers or lawyers, for ex- 


ample, and not simply grades in courses. 


SUGGESTIONS FOR READINGS 


Anastasi, A. Psycbological Testing. New York: Macmillan, 1954. 

Cronbach, L. J. Essentials of Psycbological Testing. New York: 
Harper, 1949. 

Greene, E. B. Measurements o 
York: Odyssey Press, 1952. 

Noll, V. Н. Introduction to 
Houghton Mifflin, 1957. 


SUGGESTIONS FOR LABORATORY WORK 


1. Administer several standardized aptitude tests to the class. Cut the 
time allowance if necessary. Students should score their own tests and 


f Human Bebavior (Rev. edition). New 


Educational Measurement. Boston: 


1 hen called for. , 
i à нуе "Manual the specifications which the author lays down 
А titude test. Examine the items of the test. Do you agree that 


wp es lidity? Are any data given on experimental © 


the test has content Và 
validity?  . fferential Aptirude Tests. Analyze the battery 


3. Make a study of the Di 
for validity, reliability, scaling procedures, and norms. 


\ 
QUESTIONS FOR DISCUSSION 
1. How do aptitude tests differ from readiness tests in purpose and 


content? 
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2. Why does a test battery like the Minnesota Clerical Test vary 
greatly in the accuracy of its forecasts of office work? J 

3. Why are aptitude tests used more often in high schools than in 
elementary schools? : 

4. Give some reasons Why paper-and-pencil tests of mechanical 
ability are as useful as are work samples in determining aptitude. 

- How would you set up a program for selecting candidates for a 
nursing school? Outline the procedures you would use. 

6. Would you use the Bennett Mechanical Comprehension Test to 
select workers in an automobile factory? p 

7. It has been said that the best measure of aptitude for mathematics 
(or for any subject) is the achievement to date. Do you agree? К 

8. How could you discover whether the Meier Art Test is measuring 
native artistic ability and not trainit.g in art? 

9. A girl of 16 scores very high on the Seashore Music Test. Would 
you advise her to undertake a career in music? Why or why not? Whet 
else might you need to know about her? 

10. How can a follow-up study of graduates of law and medical schools 
be useful to a counselor using a professional aptitude test? 


ai 


© 


СНАРТЕК 7 


PERSONALITY TESTS 


, we have indicated on several occasions 


that prediction of success based upon measures of intelligence, 
school achievement and aptitude must always be qualified by the 
statement “provided the personality traits are favorable." In the 
present chapter, we shall attempt to see how well we can deter- 

mine favorable and unfavorable personality traits. 
There sre а number of descriptions of personality, and the 
usefulness of any definition will depend in most cases upon the 
or the teacher or school counselor, a 


purposes of the author. F 

practical working definition is to the effect that personality is 
student's characteristic Way of doing things. Suppose that two 
boys, John and Jim, re about the same age, have about the same 


In previous chapters 
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IQ, and do about the same caliber of school work. But suppose 
further that John is friendly, highly motivated, and likeable; that 


these two boys arises from their distinctive personality traits, 
not from their differences in mental ability. Failure in school— 
or in life—is often the 
use of fiis*potential personal assets, Obviously, lack of success 
may arise either from negative (unpleasant) personality traits or 
from failure to make use of positive (pleasant) personality traits. 


cases of severe personality 
drastic emotional disturb: 
inventories, on the other hand, can be 
preted in a useful way by teachers ahd counselors, 


RATING SCALES 


The rating scale is a device for obtainin 
degree to which an individual 


situation, rating scales Provide appraisals of а teacher (or of a 
candidate for a teaching position) in‘ 

ings by teachers or Principals are often re 
seeking entrance to college-or looki a job. In duis d 
rating, the judge expresses his opinion by marki 

uated scale or by checking in the 
describes the person being rated. 


B2 


FIGURE 7-1 Sample items from Various Grapbic Rating Scales 


l. From a graphic rating scale for clerical workers: 
Accuracy—Consider carefully quality of work, freedom from error. 


"^ по Ш very t few careless і meny 


errors careful errors errors 


2. From a behavior rating scale for children: 
ts his attention sustained? 


Distracted: Difficult Attends Is Able to 
jumps rapidly to keep ot adequately. absorbed hold 

from one thing a task until in what attention 

fo another. completed. he does. ` for long 

2 periods. 

(5) (4) (3) (2) a. 


32from the American Council on Education Rating Scale for 


prospective college students: 
Does he get others to do what he wishes? 


Sometimes Sometimes Displays marked 


Probably lets others 

unable to take lead. leads in leads in ability to lead 

lead his minor important his fellows; 
affairs. makes things go. 


follows. affairs. 


4. From a rating device for teacher candidates: 


(Put a check under the appropriate heading) d к 
Very inferior | Іп erior "Average Superior Very superior 
Тос! 


5. From а rating scole for teachers: 
(Circle the number which best indicates the degree or extent to 


which the qualities are practiced.) 
O=unsotisfactory; 1 —below average; 2—average; 
3—above average; 4— superior. 


Emotional maturity: Ж: * 
To what exten? does the teacher exhibit desirable 


balance between emotional responsiveness and emotional 
control? Consider disposition, sense of humor, restraint 
and thoughtfulness in dealing with others, feelings of 
security, objectivity of interest, freedom from excessive | 
fears and worries and warmth of feeling and expression. 


6. From a rating scale for officer candidates: 
fellow candidates: 


OW) 12-1334 


Relations with 

Cooperates Cooperates Leads and 

willingly cooperates. 
Good ideas. 


Uncooperative Grudgingly 
cooperative and 


contributes 
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Perhaps the most useful type of rating: device is the Graphic . 


Rating Scale or some variation of it. The typical graphic scale 
consists of a straight line, for example, five inches long, which is 
taken to represent the range of behavior in the trait. In lieu of 2 
line, several Categories representing gradations in the trait may 


be provided. The illustrations in Figure 7-1 are samples from 
various rating scales, 


Units on the 
cessive scale divisi 


€ distance of the judge's check from the low 
- 44 more summary method is also used: if there 
are five main divisions on the scale, the highest division may be 
designated “1,” the next division “2,” and so on down. 


the judge may provide obsérvations which ; 

A good scale avoids terms which are hod 
ef activities—for example, “standing in th 
position” or “moral qualities.” By the sai 
avoids narrow, specific terms. The dean o 


social 
scale 
nows 


=> 
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intimate details about a teacher—whether he sings in the choir, 
loves his mother, or plays golf well. Information of this sort is 
often called for, however. Judges do not often have occasion to 
observe or to learn about personal behavior and, in general, 
should not be asked to supply such information. 

The number of divisions on the scale should be neither too 
numerous nor too few. The optimum number of divisions on a 
graphic scale is perhaps five to seven. Fewer divisions than five 
causes the groupings to be too coarse; more than seven divisions 
demands fractionings of the trait which are too fine for most 
raters, with the result that a large part of the scale may be unused. 
A five-division scale is popular, sirice it corresponds to the mark- 
and E. Furthermore, the five categories 


ing system A, B, C, D, ч 
“average,” ‘below average,” and 


“high,” “above average,” 
“poor” seem to mark off fairly natural divisions. 

` Directions to the rater should be explicit. The adequacy of the 
directions given the rater will have a substantial effect on the 
validity and reliability of his ratings. The rater (1) should be 
given as explicit directions as possible, (2) should be told what 
is meant by the distribution of a trait, and (3) should be warned 
against assigning too many "average" ratings. This last is some- 
times needed when the persons to be rated are not well known 
to the rater; when the meaning of the traits 1s not well under- 


stood, and when the rater is overcautious. Raters must be warned 
against the “halo effect" and the tendency to see logical relations 
among traits—to assume, for example, that intelligence and moral 
behavior or intelligence and good work habits of necessity go 


together. 


As for the distribution of traits, the best first hypothesis (in lieu 


of other information) is to assume that ratings will be distributed 
in the form of a normal curve. When the baseline of the normal 
curve is subdivided into five equal parts, the percentage in each 
division (reading from either end of the curve) are 7, ?4, 38, 
24, and 7. If there are seven divisions on the scale, the per- 
centages in subdivisions are 4, 10, 22, 28, 22, 10, and 4. The direc- 
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tion" on a five-division Scale, we must expect many more in the 


many high ratings (the “generosity factor"). Stress on the 


Person is disliked, there is a tenden 
all traits. To minimize halo, the r: 


Pendently that 
ic manner, our 
only one super- 


Г 
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visor had so stated. In general, confidence increases with the 
number of agreements when ratings are made independently. 

It can be shown that if the estimates of two judges correlate 
.60, then the average of these ratings will correlate .75 with those 
of two equally good judges. It is, of course, difficult to decide 
when two judges are "equally good." We can never guarantee 
this to be true, but we can (а) select judges who at least know 
the ratees well, (b) provide careful definitions of the traits to be 
rated, and (c) allow for individual differences in rating standards. 


Summary on Rating Scales к 
Ratings from graphic scales will generally deserve confidence 


when: 
1 h can be observed in behavior are rated. 


. Qualities whic 
Energy, appearance, and teaching skill are better rated than 


are character and moral traits. ; 
Characteristics to be rated are illustrated. The use of behavior- 


grams (page 160) and instances will strengthen the ratings. 
3. Raters have actually observed the persons to be rated in 
situations where personality might be revealed, 
4. Independent ratings are pooled. — , 
5. Judges are confident that the ratings are valuable. 
6. Different standards are accounted for by explicit directions 
or by statistical techniques. 

The above rules are perhaps most useful when one has the 
problem of constructing a rating device; and they may not seem 
to be very helpful to the teacher who is faced by a ready-made 
scale. Teachers and supervisors rarely have the responsibility for 
devising a rating scale. But teachers are rated: by supervisors and 
supervisors by principals. Moreover, , students are rated by 
teachers and by principals for personality traits judged important 

ployers. Hence, the teacher should 


by colleges or prospective em] і 
be familiar with how the rating scale is put together ard how 
s. Raters can improve a rating device by offering crit- 


N 


it work: 
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icisms and commendations. Eventually comments of this sort 
should lead to a better scale. 


QUESTIONNAIRES AND PERSONALITY 
INVENTORIES 


to solve problems, but is asked to express opinions, preferences, 
and feelings, Questionnaires have been developed by psychol- 
ogists for use in three main areas: (a) personal-social behavior, 


rately, perhaps, maladjustment—is revealed by a person’s self- 
Teport of his worries, fears, feelings of insecurity or of depression, 


(for example, war, freedom of speech 
and internationalism) Finally, the interest 


with preferences for occupations, people, school subjects (such 
as physics or history), books, sports, hobbies, and avocations. 


asked (among other questions) whether h 
bookkeeper or an airline pilot indirect), 
of the question, the assumption made is that an examinee (e CN 
likely to fake or rationalize his answers when he jg Not sure what 
motive or what personality trait the inventory ig trying ro un. 
cover. 


In the indirect form 


- 


tory is to be used generally a 
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The Personality Inventory 

Personality inventories were used by the armed forces in 
World Wars I and II to screen out the maladjusted and those 
likely to become mentally ill. These personal data (or PD) 
sheets consisted of lists of symptoms reported by men who sub- 
sequently had suffered from "nervous breakdown" or were classi- 
fied as psychoneurotic. Adult questionnaires have been revised 
by deleting the more serious and disturbing items so that they 
could be used in the schools. Questions which were removed 
deal with the more reprehensible forms of adult behavior such 
as those involving liquor and sex offenses. In the schools, the 
questionnaire is used to locate pupils with potentially handi- 
capping personality problems. The acceptability of a PD Sheet 
for pupils, parents and the community is necessary if the Inven- 
s a group test. A teacher will be well 
advised to make sure that the inventory he purposes to use has 
the approval of the school authorities. It is important that the 
reading level demanded by an inventory be carefully scrutinized, 
since many items may not be understood. j 

The personality inventory is most valuable in the schools for 
counseling and guidance—that is, for spotting pupils with exist- 
ing or potential personality difficulties. When used individually 
and in face to face contacts, the PD Sheet is more flexible and 
becomes essentially a directed interview. Answers given by the 
student can be pursued further until their meaning 15 clear. This 
cannot be done, of course, when the inventory is administered, 
in group form. Of the personality inventories available (many 
cover the same ground), the following “represent acceptable 


“tests” for use in the schools:. — 
California Test of Personality 
Pintrier’s Aspects of Personality . 
Gordon’s Personal Profile and Personal Inventory 


Bell’s Adjustment Inventory 


Thurstone Temperament Schedule 
Each of these questionnaires will be considered in this section. 
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California Test of Personality* 


Description. This test series runs the gamut from the elementary 
grades to adulthood. Each battery is divided into two sections 
designed to measure (1) personal adjustment and (2) social ad- 
justment. The six sub-tests in section 1 are designed to bring out 
how a student thinks and feels about himself, his feelings of con- 
fidence and adequacy, his tendencies to withdraw within himself 
and to exhibit nervous symptoms. In section 2, the six sub-tests 
question the examinee on his knowledge of social standards, his 
social skills, his freedom from anti-social attitudes, and his rela- 
tions to family, friends, and the community. The questions are 
Yes-No in form. Figure 7-2 gives samples from the test. 


Scope. There are five Separate test batteries: f 


Primary Series, kindergarten to grade 3 
Elementary Series, grades 4-8 
* Published by the California Test Bureau, Los Angeles, Calif. 


FIGURE 7-2 Sample Items from the California Test of Per- 
sonality, Elementary, Grades 4-5-6-7-8, Form AA 


PERSONAL ADJUSTMENT (Circle YES or NC) 


10. Do your parents or teachers usuall 
do your work? 


23. Do people often think that you cannot do things well? 


у need to tell you to 


25. Do you feel that your folks boss you too much? 
38. Are you proud of your school? 

50. Would you rather stay away from most parties? 
68. Do you often feel tired before noon? 


SOCIAL ADJUSTMENT 


77. 15 it necessary to thank those who have helped you? 
87. Do you help new Pupils to talk to other children? 
101. Do people often act so meon thot you hove to be 
nasty to them? а 
114. Do you like both of your parents about the same? 
123. Is it fun to do nice things for some of the other 
boys or girls? 
139. Do you try to get friends to obey the law? 


Reproduced by permission of the California Test Bureau, 
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Intermediate Series, grades 7-10 

Secondary Series, grades 9-college 

Adult Series 1 
Over-all time for administering a test battery is approximately 50 
minutes, 

Scoring. Answers can be recorded in the test booklet itself or 
оп a prepared answer sheet. Scoring is objective and easy. A 
profile of the different scores and over-all adjustment score can 
be constructed. The pupil's earned score (point score) is entered 
opposite the personality component and the percentile rank cor- 
responding to this score is found in the appropriate table. Per- 
centile ranks for total personal adjustment and for total social 
adjustment may also be entered. kem 

The reliability of the five batteries is quite high (.80-.94). 
Percentile norms are provided for each sub-test and for the 
battery as a whole. This inventory is a useful indication of a 
pupil's all-around adjustment. Diagnosis from the sub-tests is sug- 
gestive rather than conclusive; but many valuable clues which 
serve to explain a child's behavior may be obtained. 


Aspects of Personality (Pintner)* 

Description. This inventory consists of three parts: ascendance- 
Submission, extroversion-introversion, and emotionality. It was 
designed to aid the classroom teacher in locating children who 
have developed—or arc likely to develop—serious behavior prob- 
lems. Samples of the kind of items found in the test are as follows: 

Same-Different 

I f nerve i 

I xr br s the class * > Same-Different 

I feel tired most of the time Ў Samc-Different 

When a chiki tries to push into line ahead of ine, I ] 

am not afraid to tell him to get back Sarne-Different 


The pupil indicates his agreement or disagreement with a state- 
M = » 
ment by marking or circling "same" or *different. 
* Published by the World Book Company, Yonkers, N. Y. 
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Scope. Aspects of Personality is intended for use in the elemen- * 
tary grades and junior high school. 


Scoring. This inventory is readily scored by means of a stencil. 
There are separate norms for boys and girls, and for two levels 
of maturity. A low score on the ascendance-submission part in- 
dicátes a shy child, a high score an aggressive one. High scores on 
extroversion-introversion suggest good adjustment; low scores 
suggest withdrawing tendencies and daydreaming. Low scores 
on emotional stability suggest flightiness and lack of control. The 
total score is a rough index of personal adjustment, and probably 


only wide deviations should be investigated. The test often pro- 
vides useful clues. 5 


Personal Profile and Personal Inventory (Gordon)* 


Description. The Personal Profile is designed to measure four 
fairly distinct personality traits. (a) ascendancy, (b) responsi- 
bility (perseverance or reliability), (c) emotional stability, and 
(d) sociability. The examinee is asked to indicate which of four 
statements (there are eighteen sers) is most descriptive of himself 
and which is least descriptive. A specimen set is 
Able to make important decisions without help 
Does not mix easily with new people 
Inclined to be tense or high strung 
Е Sees а job through despite difficulties \ 

Each of these phrases is descriptive—positively or negatively— 
of one of the four traits included in the inventory. 

This personality questionnaire uses what has been called the 
“forced-choice” *echnique—that 15, the examinee is instructed to 
choose between statements two of which appear to be equally 


acceptable and two equally unaccep.able. (See the description of 
the indirect questionnaire on page 164.) This method of pre- 


. senting items has certain advantages. If the two choices are fairly 
well equated for social value, it is hard for the examinee to fake 
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his answer, since he does not know clearly what is behind either 
choice. Again, the use of forced-choices reduces hesitation and 
indecision, since the examinee is required to make a decision, 
rather than simply choosing between Yes and No. If the examinee 
likes none of the choices, he may select the least objectionable. 
In addition to the four trait scores, the Personal Profile yields a 
total score, which may be represented graphically along with 
the four part scores. Very low total scores have been found to 
be associated with maladjustment and poorly developed per- 
sonality. 

The Personal Inventory also covers four traits: caution, original 
thinking, personal relations, and vigor. The total score depicts 
the student's personal developmént in these areas. 3 


o Scope. The Profile and Inventory are designed for high schools, 
colleges, and adults. 


Scoring. Percentile norms are available for each scale, for boys 
and girls separately, for high school, and for college. The four 
scores and the total may be represented graphically ona profile. 
These questionnaires have considerable validity, as is shown in 
follow-up studies. Together, the two inventories are useful in 
counseling students and iri screening out those with potential 
behavior problems. Reliabilities of the sub-tests and of the total 


are satisfactory. 


Adjustment Inventory (Bell)* 

Description. This well-known inventory consists of questions 
to be answered Yes, No, or ?. It has been designed to estimate 
personal adjustment in four areas: home (satisfactions and dis- 
satisfactions), health (illness and general well-being), social 
relations (shyness, aggressiveness, and so on), and emotional be- 
havior (self-confidence, depression, and the like). Samples of 
the kinds of items in the questionnaire are 
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Are you troubled with shyness? Yes No : 
Do you daydream frequently? Yes No З 
Ате you often low in spirits? Yes No ? 


The Bell inventory has proved useful chiefly in locating students 


who need counseling. It provides valuable leads to social and 
personal maladjustment. 


Scope. The student form of the Bell is for high school and 


college students. There is a form for adults which may be used in 
vocational counseling. 


Scoring. The inventory is not timed, but ordinarily requires 
about twenty-five to thirty minutes. The over-all reliability is 


high. There are percentile nozms for high-school and college 
` students and for men and women. 


Thurstone Temperament Schedule* 


Description. This inventory consists of a set of 140 questions, 
twenty items being grouped under each of seven aspects of 
temperament or emotional expression. Adjectives describing the 
Seven temperamental traits are: active (degree of energy), 
vigorous (participation itr physical activities, sports), impulsive 
(happy-go-lucky), dominant (aggressive, forthright), stable 
(emotionally), sociable, and reflective (thoughtful, meditative). 
These seven behavior areas, which were identified through a 
study of the intercorrelations of many personality variables, are 
believed to constitute certain basic aspects of social behavior. The 
inventory is well adapted for use with normal people: items 
obviously bearing upori mental disease have been avoided. 


Scope. For high schools, colleges; and adults, 


Scoring. Percentile norms 
and the seven scores may be plotted on a Profile for 4 study of 
idiosyncrasies. The reliability of the Whole inventory is high. 
IHowtver, the reliabilities of the sub-sections are not high. Hence, 
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although the inventofy may provide valuable clues to a counselor, 
diagnosis based on part scores should be tentative. 


ATTITUDE SCALES 


When we know that a man is a Socialist or a Christian Scientist, 
_we feel fairly sure that we can predict his answers to questions 
dealing with politics or religion. An attitude is a consistent point 
of view, a way of behaving toward an institution, a social group, 
or toward personal, political, or religious issues or practices. 
Attitudes may be fairly narrow or quite broad; and they may be 
strongly or weakly held. In gefteral, an attitude pivots around 
strong likes or dislikes. А person's attitude toward drinking, 
°professional sports, popular music, or "eggheads," for example, 
will be exhibited in expressions of opinion which are often 
emotional. t 

Scales for measuring the spread and strength of attitudes have 
often been used by social psychologists but are rarely employed 
routinely in the schools. One of the most comprehensive lists of 
attitude scales (about thirty in all) has been constructed by 
L. L. Thurstone and his associates at the University of Chicago. 
These scales estimate the strength of one's attitude (on either 
the favorable or unfavorable side) toward such diverse matters 
as war, capital punishment, the church, and communism. у 

In this section we shall describe two scales both of which 
have been useful in high school and college. These are 

Ascendance-Submission Reactior: Study (Allport) 
Study of Values (Allport-Vernon-Lindzey) 


Ascendaace-Submission Reaction Study* 

This questionnaire attempts to determine whether 
d person characteristically dominates or is dominated in the face- 
puse contacts of everyday life. The A-S Reaction Study is 
assified as a personality inventory, but it can just as 


Description. 


wsually cl 
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well (perhaps better) be described as an actitude scale, since it 
tries to discover an individual's habitual way of behaving in 
everyday social contacts. There are two forms of the test, one 
for men and one for women. Each item presents a situation which 
might readily be encountered in School, on the street, or in a 
store or bus. From two to four possible responses are offered. 
The examinee selects that option which most nearly represents 


what he would ordinarily do. Choices range from aggressive to 
submissive and are weighted in suc 


h a way as to differentiate be- 
tween the two attitudes. Scoring weights for the Separate iterns 
were determined ex 


perimentally, and the total score shows the 


strength of the examinee's typical behavior on a dominance- 
submissive scale. 


Scope. The A-S invento. 


ry is designed for use in high schools, 
in colleges, and for adults. 


Scoring. Scoring is by stencil, the answers being weighted - 
(plus) for dominance and — (minus) for submissiveness. Per- 
centile norms are provided for high-school and for college stu- 
dents, and for adult men and adult women. The A-S Study is 
often useful in educational and vocational guidance. In many 
Occupations, such as nursing, teaching, library work, and clerical 
jobs, a strongly dominant attitude is 
asset. On the other hand 
dominant behavior and self-con 


Study cf Values {Allport-Vernon-Lindzey)* 


Description. This questionnaire sets out 
of six basic attitudes, described as follow. 
by dominant interest in the discovery 
approach to life); economic (interests li 


to gauge the strength 
S: theoretical (marked 
of truth, the rational 
e In. practical affairs); 
* Publisned by the Houghton Mifflin Co., Boston, Mass, 
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aestbetic (places greatest value on form and beauty); social (chief 
interest in people) ; political (primary interest in power, influence І 
and renown); and religious (éommitted to mystical values seeks 
to comprehend the universe). The Study assumes that a person's 
philosophy of life is revealed by the strength of his basic 


attitudes. See Figure 7-3. 
Scope. College students and adults. 


Scoring. To score, one simply adds the weights assigned the 
various items. The total score for each of the six Values can be 
plotted on a profile to show graphically the relative strengths of 
the individual’s attitudes. Norms are for college students, for 
men and women separately, and for some occupational groups. 
"The Values inventory has shown expected differences between 
medical’ and theological students and characteristic differences 
among other occupational groups. The inventory is useful in 
counseling and in personnel selection. It is also valuable in fore- 


casting the direction of a student's attitudcs. 


dy of Values 


Specimen Items from a Stu 


FIGURE 7-3 


(Answers are indicated by checking or marking.) 
Theoretical v. Economic: 
1. The main object of scientific research 

rather than its practical applications. (a) Yes; 


should be the. discovery of truth 
(b) No. 


Religious Y- Social: 

9. Which of these © 
(a) high ideals and Feverence: 

cal v. political v. economic: ^ 

llowing would you prefer to do during part of your 

ability and other conditions permit)— 

| biological essay or article 

ntry where you 


horacter trails do you consider the more desirable: 
(b) unselfishness and sympathy? 


Aesthetic v. theoreti 


10. Which of the fol 
next summer vacation (if your 


a. write and publish an original 
É. stay in some secluded part of the cov 


can appreciate fine scenery 
enter û local tennis or other athletic tournament 


d, get experience in.some new line of business 


n" 


Lindzey, “Study of Values.” Reproduced by permission of Houghton 


-Vernon- 
Mifin Company- 


From Allport 
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INTEREST INVENTORIES 
The interest inventory is essentiall 


n the school, knowledge of a 


cal significance for the coun- 
selor or teacher. The printed interest i 


tematic information about a student's 
personality trends which otherwise coul 
long interview, if at 
"which differ widely 
goals. The astute co 


ventory to suggest Occupational areas whi 
had not even thought of, 


The interest inventory is generall 


ir true feelings, especially if they are 
not conventional or socially acceptabie. An interest inventory is 
not likely to be faked or responded to adversely! Examinees find 
it impersonal, less prying, and often interesting in itself, Hence 
their appraisals are usually honest. From an j 


venture, for active rather than passive ro| 


The best-known interest inventories are vocational and were 
intended for adults. As 2 result, they are not Very useful below 
the eighth grade. This is not a serious disadvantage, however, 
since the interests of elementary children are often uncrystallized 
and may be superficial, unreliable, and unrealistic, The moving 
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pictures, TV, and romantic stories invest certain activities (that 
of the actor, the game hunter, space adventurer, and detective, 
to cite a few) with artificial glamor. Moreover, a pupil’s informa- 
tion „concerning many occupations—time required, aptitudes 
needed, financial returns to be expected—is often meager and 
distorted. Even in high school and college, when choice of voca- 
tion becomes crucial, information on many occupations. is по: 
available. A number of pamphlets describing the requirements 
for various occupations will be helpful to counselors and students 
(see page 253). 

"This section will describe three interest inventories, one suit- 
-able for pupils whose reading level.is up to sixth-grade standards, 
and two for high-school and college students and for adutts. 


` Occupational Interest Inventory 
Kuder Preference Record 
Strong Vocational Blank 


Occupational Interest Inventory? 

Description. This inventory provides scores in three interest 
areas. First, there are scores in six basic fields of occupational 
interest: personal-social (personal contacts, service fields); 

farming); mechanical (machinery 


natural (outdoor activities, farn е Am 
things); business (activities of the 


design, building, constructing things) tiviti 
business world, the “profit motive ); arts (music, literature, 


drama); sciences (chemistry, engineering, biology). Second, cer- 
tain items are designated verbal, manipulative, or computational, 
and scores in these areas provide information as to the direction 
of one’s interests. Finally, the attempt is made. to gauge the level 
of an examinee’s interests—whether his interests identify him 
with simple routine aspects of a job or with the more expert 
performances and skills. eto | 

The six basic fields of occupational interest show considerable 
overlap, and their identity as strictly separate compartments of 
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ОЁ question asked. The large letter before a question gives the 
interest field, the small letter the interest level, and the symbol the 


interest type (whether verbal, manipulative or computational). 


Part I 


Directions: Draw a circle around the letter of the activity yon 
prefer, For example, if you prefer to drive an ice- 
cream truck and sell ice cream, draw a circle around 
А 1 as shown below: | 


(А DDrive ап ice-cream truck and sell ice cream. 
GE 1 Wrap articles in the shipping department of a store. 
However, if you prefer the second activity, draw a circle around 
F 1. A second item, to be marked according to the same direc- 


AK 14 Conduct visitors through art galleries and museums 
E 14 Help build automobiles, ships, or airplanes 


Part II 


Directions: Below you will find three 
ч number. You аге to choose the one 
of the three in each group. Indicar, 

marking the letter preceding the 


you prefer to. do 

€ your choice by 
£ activity, 

fete Design or construct stained glass 


> Metal ornaments or 
. plastic figures 
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b 11 Make pottery, statues or book ends 
d 11 Carve wood or stone or make metal ornamental figures 


Scope. The Intermediate Inventory begins with grade 7 but 
may be used with bright sixth graders. The Advanced Inventory 
is for grade 9 and for adults. 


Scoring. Percentile ranks for the six basic interest fields may 
be read from the appropriate tables. Standard scores from con- 
verted raw scores are also provided. Part scores may be repre- 
sented graphically by means of profiles for a clearer comparison 
of interest-strength. The working time for the test is thirty to 
forty minutes. Scoring is simple: the items designated by letters 
and symbols are counted. ° 


Kuder Preference Record* 

Description. The Kuder Preference Record (Form B1) is a 
widely used vocational interest blank. There are 360 items in all, n 
arranged in groups of three. In each set of three, the examinee is 
asked to indicate which activity he would like 70? and which he 
would like /east. Response is made by punching a small hole with 
a pin, and the answer is recorded on a specially prepared answer 
sheet placed under the test blank. The samples below (given as 


examples in the Record) are for illustration: 


Directions: You will find a number of activities listed in groups 
of three on the following pages. Read over the three 
activities in each group. Decide which of the three 
activities you like most. Note the letter in front of 
it and punch a hole beneath the 1 beside this letter 
in the column at the right, using the pin with which 

e been provided. Then decide which activ- 

like Jeast and punch a hole beneath the 3 

he column at the 


you hav 
ity you | | 
beside the corresponding letter in t 
right. 

* Published by Science Research Associates, 


2 vocational and 1 personal. 


Chicago, Ul. There are 3 forms, 
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Example #1 < 

1 3 

OPO i 
P. Visitan are gallery i 3 
ial < least 
Q. Browse in a library Я Q H 
R. Visit a museum most GRO 


"The punch in the hole beneath 1 beside R shows that the ex- 
Срат would most like со visit a museum. The punch in the hole 
beneath 3 beside Q meaus that he would least like to browse in 
a library.) е 


= Example #2 


. 1 3 
S. Collect autographs OSO 
1 ay 
Т. Collect coins most > @ T O 
1 3 
U. Collect butterflies OU @ < least 


most like to collect 
that he would least 


choice type in that the examinee 
has to make a selection among limi i 


literary, 
or young adult 
€S required in many 
» this is а help to the _ 
cations to his client. 
Оп about the interest 
Various lines of work, 
reveal interest trends 


counselor, who can then describe these v 
The Manual provides valuable informati 
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over several broad areas rather than in fairly specific occupations. 
There are, for example, fifty-three occupations grouped under 
scientific interests. Items in the Record were first organized on 
a logical basis in the light of everyday experience and common 
sense. Later, items were analyzed statistically in order to isolate 
clusters of items highly correlated. These clusters were taken to 


reveal a core of interest. . 

The Kuder Record relies for its validity primarily on content 
analysis and logical relations. The number of choices offered 
and their nature sometimes confuse students; and the inability 
to find clear-cut preferences may lead to dissatisfaction with the 
forced-choice aspect of the test. Below the eighth grade the 
reading level is probably too higlf, and the Record should not 
be used. The fact that the scoring plan docs not weight sharply 
strong vs. weak interests has been another criticism of the test. 
At the same time, the Kuder Record is an excellent measure of 
the range of expressed jnterests, and as such is valuable in edu- 
cational and vocational guidance. It is often possible to point 
out to a student that he has expressed many interests not in line 
with his vocational goals. The median reliability coefficient for 


the nine interest-areas 15 91. 


Strong Vocational Interest Blank* 


Description. This was the first vocational interest blank and is 
still the best-known. There are forms for men and women. The 
Blank has gone through several revisions and in its latest form 

grouped under eight categories. 


comprises four hundred items d i 
These are occupations (likes and dislikes), school subjects, 


amusements, outdoor and indoor. activities, responses to peculiari- 
ties of people, choice of activities, comparison of mie and 
evaluatiorf of personal abilities. The examinee indicates is c vs 
by circling or marking. Answers to the items аге o RU 

weights, obtained by comparing the replies of a ү m nate 
pational group (lawyers for example) with the replies of people 
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in general. In all, forty-five occupations or areas of interest are 
covered by the Blank. G 


Scope. For men and women. 


Scoring. A person’s score on a given scale (his interest in teach- 
ing, for example) is found by totaling the plus (+) and minus 
(—) credits obtained from the options he has marked. A separate 
key is used for each vocation. Thus an examinee's blank may be 
scored for the interests of an engineer, a physician, and a sales 


identification of interests with 
d B— somewhat lesser agreement, 
t interest pattern from that of the 
occupation under stu dy. For example, a college student may have 
a minister and social Worker, and 
of a mathematician or physicist. 
9, and often more valuable than 
scales for interest clusters. There 


interest in social science, 


or school Superintendent); 774 
neer), and business-commercial 


= which his parents have for him 


« SUMMARY ON PERSONALITY INVEN 


. Validity. Insofar as ап inventory includes ] 
experts agree are relevant to the area being tested 


TORIES 
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naire has content validity. The adjustment inventories (PD 
Sheets) are made up of items drawn from texts on abnormal psy- 
chology and cover conditions which have been found to be 
symptomatic of mental illness. The interest inventories have been 
validated experimentally against a number of criteria: expressed 
interests of successful professional and businessmen, successful 
completion of training courses, ratings for work success, staying 
in an occupation vs. leaving it, and degree of job satisfaction. 
Correlational analysis has been used to locate clusters of items 
which embrace a common core of interest or a community of 
interest patterns. Follow-up studies of the Strong Vocational 
Interest Blank show that men tend to stay in occupations for 


which they expressed strong interests as students and to change 


occupations for which their expressed interests were weak. 
several precautions should be 


In using interest inventories, 1 
taken. It is well to remember that interest and aptitude are not 


the same thing, and that many youngsters express interest in 
vocations for which they have little capacity. Again, the interests 
of young people—especially those below the age of 25—often 
change markedly. Adolescents may express unrealistic interests 
which change drastically later on. More than. one determination 
of interests, therefore, should be made. Finally, jt must be 
remembered that advice about, occupational families is much 
safer than advice about specific jobs. Any inventory, personality 
or interest, should be supplemented by school and intelligence 
records, ratings for health, appearance, motivation and socio- 


economic status. 
. = B H H 
Reliability. The reliability coefficients of most inventories 15 


high— .80 or more. As interests change over a period of time, 
reliability determinations can be relied on for short periods only. 
[zi e 


Scaling. Inventories are usually scored. by assigning weights to 
the various options presented. These points are converted into 


i 1 yradss. 
percentile norms, standard scores, and sometimes letter grades 
he adjustment inventories are most ‘often for stu- 


Norms for t RT The m i 
dents, less often for occupational groups. The interest inventories 
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report norms for occupational families (Kuder) and for specific 
occupations (Strong). Test Manuals provide many useful sug- 
gestions for the interpretation of the inventories. . 


SUGGESTIONS FOR FURTHER READING 


Anastasi, A. Psychological Testing. New York: Macmillan, 1954. 

Freeman, F. S. Theory and Practice of Psychological Testing (Rev. 
edition). New York: Holt, 1955, 

Jordan, A. M. Measurement in Education. New York: McGraw-Hill, 
1953. 


Travers, R. M. Educational Measurement. New York: Macmillan, 1955. 


SUGGESTIONS FOR: LABORATORY WORK 


1. One of the best ways to become acquainted with a personality inven- 
tory is to take it yourself. Members of the class should take as many of 


the questionnaires as are available, score them, and draw up profiles 
where called for. 


2., Examine the Manual for the Kuder 
What is said about validity, reliability, 


3. Study the Manual for the Allport A-S Reaction Study and/or the 


QUESTIONS FOR DISCUSSION 


1. In an adjustment inventory, the number of 
comes the score. What is meant by saying that St 
for a personal-data sheet? 

2. Which interest blank, the Kuder or the Strong, 
for high-school students? и 

3. How might an interes: i 
sonality trends? 

" 4. How closely related are i 
ship change with age? 

5. Why are personal data blaaks of little value When admi 
“group tests”? м 4 

6. Under what circumstances do you think the ; 1 Е 
would be most helpful? At what age levels? Give "Ao: [pa 


Positive symptoms be- 
anley is on the median 


is more appropriate 
nventory be used in Studying a child's per- 


nterests and aptitude? рое; the relation- 


a 


nistered as 


answers. 


a 
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7. What factors limit the usefulness of paper-and-pencil adjustment 
inventories? 

8. A high-school senior expresses a strong interest in engineering, but 
his interest inventory score does not confirm this interest. What would 
you as counselor suggest to him? 

9. The Strong Vocational Interest Blank has a key for various specific 
occupational interests—dentist, banker, carpenter, for example. What 
difficulties do you see in such restricted interest patterns? 

10. Why is a personal data sheet easier to fake than an interest blank, 


even when the items are not forced-choice? 


CHAPTER 8 


OBJECTIVE-TEST ITEMS AND 
SHORT-ANSWER TECHNIQUES 


There are at least two reasons why the teacher interested in 
guidance should be familiar with the main types of objective test 
items. In the first place, the most widely used standard group 
tests are made up of items of the Objective sort, (See Chapters 
4 and 5.) Hence, a knowledge of the Strengths and weaknesses 
of objective questions will enable a teacher to make a more dis- 
criminating choice among several rests of intelligence, educa- 
tional achievement, or aptitude proposed for use in a school. 
Second, a teacher-made test is greatly improved when the teacher 

enews tHe principles which govern the writin и 


: 8 of objective type 
items and the assembling of them into a test. ) typ 
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This chapter will describe some of the better-known—and 
more widely used—verbal objective type items. These include 
the true-false, multiple-choice (best-answer), matching, comple- 
tion, and short-answer essay questions. The advantages and dis- 
advantages of each of these item types are listed and examples 
given to illustrate errors to be avoided in writing items of each 
type. Objective tests employ numbers, geometric forms, pictures, 
and diagrams, as well as words. (Figure 8-1 provides a number of 
illustrations.) Some of the varieties of non-verbal items frequently 
encountered in standard tests fall under the following heads: 


1. Number Series Completion.* The examinee is‘asked to com- 
plete a series of numbers—which are related in some way— 
by the addition of one or more appropriate numbers. 

2.• Figure Completion. The examinee must complete a figure by 
the addition of a line or other detail. 

3. Likenesses and Differences. From a list of pictures showing _ 
objects or activities, the examinee is required to select several 
which belong together, or to select an item which does not 


belong with the others. 

4. Picture Completion. Th 

from which one or more items ha 

5. Errors in a Drawing. The examinee must 

errors in a drawing. ^ И 

6. Arranging Pictures. The examinee is to arrange a set of pic- 
tures in orderly fashion so as to tell a story. 

Most non-verbal items are variations on the multiple-choice 

type. Non-verbal items are frequently used in tests designed for 


young children. (See page 119.) 


Comparisoc of Objective Items and Essay Questions 
The traditional essay question often covers too much ground, 
and is open to large errors in scoring and interpretation. Con- _ 
sider the question “Discuss the causes of the War of 1812” as 


* This test is also classified as verbal. 


e examinee is to complete a picture 
ve been omitted. 
locate and correct 


FIGURE 8-1 Objective-Test Items 


Direchons: The examiner reads several statements about each set of pictures. The 


Answer ( 


student is told to put a -+ 


n in the ( ) after the number 
a if the statement is true, а 

» O if it is false. 

з Example: The examiner 

Ы reads, "Un cheval vient " 
de de s'abbattre sur la route. 
19 The student puts а + or О 
n in the ( ) after the number 
dia of the statement. 


(4) kindling temperature 
(5) paper 


(3) ashes given off (6) combustion 


). Student puts in the number of the answer. 


31. A claw hammer is shown in picture 
13678 ‹ 

32. A chisel is shown in picture 
24567 

33. A ball peen hammer is shown in 
picture 13568 

- 39.' Tool #1 can be used to (a) file metal, 


40. 


that best completes the 
e right. 


(b) polish metal, i 

(d) take dents ES ST hieles; 
(e) caulk metal, 
Tool #2 can be 
metal, (b) file me 
а screw, (d) faste, 
(e) lock a bolt, 


of metal, 


used to (a) mark 
tal, (c) drive 
n a bolt, 


ВЕЕ 5-1 


Represented by Picture, Drawing, or Diagram 


~v 


YOU 


B’ Directions: If the two TRUE FALSE CANNOT 

equal circles whose TELL 

centers are O and O' 
B have АОВ = «A'O'E" EJ el U 
, then arc AB—orc. A'B'. 
A . 
Directions: Mark two things good to eat. 

: А 


o 
o 
Direclions: Which of the five figures can be made from the pattern in Example X? 

More than one may be correct. 


Egge let 


alike in some way. 
icture among the four to the 


ures in eoch row ore 


id then find the one pi 
like them and mork its number. 


Directions: The first three pict 
Decide how they are alike, an 
right of the dotted line, thot is most 


D 


Directions: Mark the one thing nof like the others. 
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an example of a common form of essay question. Answers to es 
question will almost certainly include material that is true a 
relevant, material that is ambiguous, material that is — 
erroneous, and material that is mostly padding. It becomes well- 
nigh impossible for two or more readers to evaluate the answers 
to such a question in the same way. However, when choices m 
objective test questions are recorded by checking one of наи 
possible answers, circling a number, or underlining a word О 
Phrase, the grade on the test will be the same whether the w-— 
is done by a clerk or by an expert. And the answer will be rig 
or wrong. 

Examinations composed of pbjective items possess several other 
advantages over questions of the essay type. The objective iten 
not only eliminates unreliability due to personal opinion but *s 
the more easily scored, is economical of time, and allows for 3 
wider sampling of material. Furthermore, the objective test item 
forces the student to answer a question directly, gives him little 
opportunity to equivocate or dodge, and is, for that reason, 4 
more dependable measure of what a student knows. On the nega- 
tive side, the objective item may provide little opportunity for 
the examinee to display his understanding and organizing ability. 
When poorly made, the objective icem may lay tco much stress 
on rote memory and unrelated bits of information, 


Defining the Purpose of the Test ltem 
It is necessary to keep constantly 
intend our test items to serve. Items 
in a standard test, exainined—with th 
cannot always be sure, it is true of 
"ds measuring. But we can ‘sharpen our aim b 
specifications (page 211) which we want our 
example, an item should: 
1. Elicit information (often fairly Specific) w 
| understanding of a process, principle, situati 
movement. 


hich reveals ап 
Оп, or historical 
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2. Require the examinee to demonstrate knowledge and use of 
technical terms and concepts. - 

3. Give the examinee a chance to show his ability to apply a 
principle in the solution of a probiem, draw a conclusion, 


arrive at a generalization. 
4. Call forth responses which will reveal the examinee’s atti- 


tudes, interests and personality traits. 


Not every item, of course, can be fitted neatly into one of 
these categories. Some (many, we hope) will cut across several. 
Nevertheless, each item should be written to achieve a definite 
purpose, to call out some important bit of knowledge, under- 


standing or application. 5 


o 


Assembling Test Items 


In the process of making an objective test, the type of item 
to be used must be decided upon and the items written, before 
we are ready to assemble them into tentative form and try out 
the test. Several problems arise: determining the difficulty of 
the items and their discriminative power, drawing up directions, 
and preparing а key and scoring sheets. Methods for carrying 


out these procedures will be treated in Chapter 9. 


TRUE-FALSE ITEMS 


The true-false test presents à series of statements or questions 
each of which is to be marked “T” (true) or F” (false). Instead 


of circling one of the letters “Т” or "By. the examinee may be 

asked to circle “Yes” ОГ “No,” or to write +°(plus) or — (minur), 

or in some other way to, designate a positive or negative answer. 

One of the earliest objective forms, the T-F test is still widely 

) used in group intelligence as well as in educational achievement 
and aptitude tests- It has been criticized as being а measure of 

a test of detached and unrelated facts, and as often 


rote memo: à А pics 
being шыр е ang equivocal. Such strictures are justified when 
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the test is poorly or carelessly made. There is a large element of 
guessing in T-F tests, too, and good items are not easy to con- 
struct, however simple the process may seem to be. But when 
well made, T-F items have valuable possibilities arising from 
their scope and flexibility. The chief advantages and disadvan- 
tages of the T-F item may be summarized as follows: 


Advantages: 


1. It may be used with a wide variety of materials. 

2. It may be scored easily and objectively. 

3. It is the easiest objective type to construct. 

4: It makes possible an extensive sampling of material in a rela- 
tively short Space. | 

5. It is a tinte-saver, thus allowing for frequent testing. 

6. The directions are readily understood and followed. 


Disadvantages: 


1. Itis often ambiguous and confusing. 

2. It is open to guessing and to chance effects. 

3. Much subject matter cannot be stated as unequivocally true 
or false. 

4. It may readily become a test of detached and unrelated bits of 
information. 


5. It may, overstress rote memory at the expense of under- 
standing. 


Some of the rules usefül in constructing teacher- 
given below. In judging the adequacy of printed 
will help to note whether these rules have been ob 


made tests are 
T-F items, it 
Served. 

i Putting the symbols “T” and “F” before each 
preferable to having the схатіпее writ? the letters at 
a statement, thus scattering his answers over the 
or marking saves time in scoring test Papers and 
errors, since the letters written by an examinee 
legible. See examples. 


question is 
the end of 
Page. Circling 
leads to fewer 
аге not always 
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On the test paper: On the answer sheet: 
(I) E. Шз; 1 Е 
"E VELO. cecus 2 T (B 
2. Make the number of true statements equal to the number 
of false statements. The scoring formula for T-F items is 


Score — Right — Wrong 
or Score = Total — 2 X Wrong 


Either of these formulas corrects for guessing, and both give 
the same result provided the pupil has tried al of the items. 
Suppose for example that there are sixty items in the test, and a 
pupil gets forty right and twenty wrong. Then his score is 40 — 20 
or 20, or 60 — 2 X 20 or 20. If the child does not try all^of the 
items, the two versions of the formula will not give the same 
result and the first (R — W) should be used. 

If an examinee guesses at every item, he should have one-half 
of the items right and one-half wrong, and his score (R — W) is 
properly zero. If an examinee attempts only thirty out of forty 
items in a given examination, his score may be corrected to a 
total of 40 by adding one-half of the untried items, that is, half 
of 10, to his number right. (Presumably he would get one-half 
of the untried items right by guessing.) It is not necessary to 
correct every paper to the number of items Іп the test. But test 
Scores for a class are the more fairly ae when all are 
base he total number of items in the test. 

E between number right and (R — W) is per- 
fect when all of the items of the test have been tried. Hence, 
when a child’s score has been corrected’ to the total, number 
right may be taken as the score instead of (R — W). The ques-. 
tion of whether to tell an examinee to guess has excited much 

se of the opprobrium attached to the 


s u 
controversy, partly beca robr 
term pein as related to school examinations. A good general 


rule js to instruct the student to omit only those items which he 


is sure he doesn't know, to try an item even when not entirely 


certain of the answer, but never to guess wildly. Since the exam- 
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inee has been exposed, at least, to the subject matter of the test, 
the chances are better than even that his answer will be based 
on some information, even if it is vague and uncertain. Hence, 
a T-F answer is more likely to be right than wrong. 


3. Avoid opinionated and trivial (or trick) items. 


Examples: T F Character is more important than intelligence. 
T F The ABC Test of Mental Maturity contains 75 
items arranged into 6 sub-tests. 
T F William Collins Bryant is the author of Thana- 
topsis. 
T F One-half of a perfect correlation is .50. 


The first of these items calls for a value judgment, which may 
be true or false; the second and third ask for trivial information 
and the fourth is a trick questions which happens to be false. 


4. Avoid ambiguous statements, those partly true and partly 
false, and those containing negatives, especially double negatives. 
Examples: T F Socio-economic factors are often the cause of 

war. 

T F William Jennings Bryan, the great Commoner, 
was twice elected president of the U.S. 

T F Not every teacher is careful to avoid having: 
student dislike his subject. 5 

T F Not all instincts are maladaptive. 


The first item is ambiguous; the second is 
false; and the 3rd and 4th are confusing b 


5. Avoid textbook language and 
items encourage rote memory and 
taken out of context. 


Examples: T Е The role of the teacher jg 
; tablish satisfying goals, 


Verbatim quotations, Such 
are often ambiguous when 


to help the pupil es- 
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T F Heredity determines what а man can do, en- 
vironment what he does do. 


Textbook verbiage aids in making a correct guess. 


6. Avoid specific determiners, such as all, none, always, never, 
every. Broad generalizations introduced by these words are 


usually false. 
Examples: T F Feeblemindedness is always present in delin- 
quency. 
T F Corporal punishment is never justified. 
T F All ministers lead lazy lives. - 


"These items are all too general andl all are incorrect. 


„ The T-F item is not so popular among teachers as it was 
formerly, and it is not found so often in standard tesis. It is still 
ranked high, however, and is perhaps the quickest way of sur- 
veying a wide range of material. When supplemented by other 
test forms, T-F is a valuable objective item. 


MULTIPLE-CHOICE OR BEST-ANSWER ITEMS 


The multiple-choice item consists of a statement, question, 
phrase, or word followed by several responses only one of which 
is correct, Multiple-choice is one of the most flexible of the 
objective-recognition-ty Pe forms. It is a favorite with teachers 
when making their own examinations, and is most widely em- 
ployed in the standard printed forms. Multiple-choice items can 
be so constructe formation, comprehension, 


d as to measure In 
understanding of principles, and ability xo interpret data. The 
test form is applic? 


ble to most subjects and to most materials. 
Some of the strengths and weaknesses of the multiple-choice 


item vanibe summarized 25 follows: | 


Advantages : А 

1. Answers are objective and are rapidly scored. 

2. [tems may be written to measure inference, discrimination, 
and judgment. 
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3. Guessing is minimized when four or five choices are allowed. 

4. Items may be constructed to measure recall as well as recog- 
nition. 

Disadvantages 


l. Items are often too factual, stressing memory unduly. 

2. More than one response may be correct or very nearly correct. 

3. It is difficult to exclude clues. 

4. Distractors—that is, incorrect but plausible answers—are 
often hard to find. 


Rules for constructing multiple-choice items for teacher-made 


tests and for judging the adequacy of such items in printed tests 
are as follows: 


1. Vary the position of the Correct response: put the right 
answer in the first, second, third, fourth positions equally often. 
A. scoring formula for multiple-choice items is 

Score — Right — (Wrong) 

(n — 1) ; 
in which л = the number of choices, usually four or five. This 
formula is used to correct for guessing on the assumption that 
each response is equally likely, "This:conjecture is correct when 
the examinee has no idea of the-right answer; but in educational 
achievement examinations as well as in other tests, it is a ques- 
tionable hypothesis. Distractors differ greatly in plausibility and 
likelihood; and since the student presumably has some knowledge 
of the question, he is more likely to mark the right than 4 wrong 
answer. In most educational achievement tests, taking the num- 
ber right as score saves time and is accurate enough for most 
purposes. It must be remembered that 1n à given test the number 
of options must be the save for each item if th 
formula is to be used. 


2. Do not include responses which are so unlike] 
ble от so unrelated to the question as to give the 
Distracting responses should distract, not confuse 


€ above correction 


y or implausi- 
answer away. 


—— 
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Examples: The function of a flower is to 


give pleasure 


to mankind 


— — attract insects 


produce seed 


illustrate the modification of leaves 


The capital of the United States is 


Washington 
Rome 
Tokyo 
London 
__ Honolulu 


The principal crop o 
LL pineapples 
COLI 

— oranges 
ananas 
In the first example, assuming 
rect one, the distractors are a 
three, an examinee would have 
geography to be taken in 


3. Do not provide 
to mislead the good 
answer. The good stu 
deal—but not quite eno 
student does not know enough 


wrong answer. 


Example: What was one 
the War of 1812? 


corp 


ugh—ab 


wrong answers v 
student because t 
dent is often led astray by knowing a good 


9 
f Iowa is 


c the fourth choice to be the cor- 
Il rather silly. In examples two and 


to be almost totally ignorant of 


by the distractors. 


vhich are plausible enough 
hey are close to the right 


out a question, whereas the poor 
to be misled by a plausible but 


a 


of the important immediate results of 


ә B a " 8 H . 
AM пе introduction ofa period of intense section- 


alism 


____ destruction of the U.S. bank 
“defeat of the Jeffersonian party 
_final collapse of the Federalist party |. 


2 
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The fourth response is keyed as the correct one. But 39 per cent 
of ei 


dents, checked the first option as correct. Apparently the first 


answer is plausible to students who know a good deal about the 
War of 1812. 


4. Do not give away the correct answer by providing clues 
such as (a) familiar textbook phrases, (b) having the right 
option Consistently longer or shorter than the wrong options, 
(c) repeating the words of the question, (d) asking questions 
to which the answer must be singular or plural, with only the 
correct Tesponse.being in the right number. 


Examples: In what major labor group have unions been organized 
оп an industrial basis? (Circle one letter.) 
A. Congress of Industrial Organizations 
B. Railway Brotherhoods 
C. American Federation of Labor 
D. Knights of Labor 
E. Workers of the World 


The meaning of the German word Gestalten is (Check 
one) 

——— response 

a just-noticeable-difference 

— ———— a stimulus 

configurations 

a perception 


A man hears' a loud 
"This is an example of 
m motivation 
memory image 
— ——stimulus-response 
posthypnotic Suggestion 
purposive behavior 
In the first of these examples, the adjective 
question gives the answer away. In the се 


noise and rung to the window. 


“industrial” in the 
ond, if the student 


Assembling Test Items _ 197 


knows that Gestalten is the plural of the German word Gestalt, 
he has the answer as “configurations.” In the third, the textbook 
phrase “stimulus-response” is a clear clue. 


5. In a multiple-choice vocabulary test, none of the response 
choices should be as difficult as the test word. The difficulty of 
response words can be determined from their frequency in 
Thorndike’s Teachers’ Word Book. Response words should be 
of the same part of speech as the test word, and only one should 


be correct. 


Good example: An irksome task is (a) pleasant,, (b) engrossing, 
(c) instructive, (d) wearisome 

Poor example: Do not despise him means do not (a) hate, 

e (b) malign, (c) deprecate, (d) dessicate him 

In the second example, some of the response words are more 

difficult than the test word. This is not true of the first example. 


tatements followed by a series of 
han questions in which the answers 
t. In the latter form, the examinee 
for each option. 


6. Direct questions or S 
options are usually clearer t 
are imbedded in the statemen 
must read through the statement 


old receives a percentile rank of 40 


Good example: A. 10-year- | 
on a test of arithmetic. This means that 


__ —he is above the mean of 10-year-olds on 
the test. 
he exceeds 60 pér cent of 10-year-olds. 


0 per cent of 10-year-olds did worse 


pace са! 
than he. 
61 per cent of 10-year-olds exceeded his 


foa 
score. 
Percentile ‘rank shows the per cent (a) at or 


above, (b) above, (c) at, (d) below, (e) at or 


below the given score. Alpe 
more difficult to decipher than the first. 


Poor example: 


The second example is 


A test made up of multiple-choice items takes more-time to 
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construct than a test o 
choice items are hard 
often difficult to find 


f T-F items. Furthermore, good multiple- 
er to prepare than T-F items, since it f. 
acceptable distractors. The advantage o 

the T-F item is largely offset, however, by the fact that sa с, 
choice items are more searching and demand a more ا‎ 
knowledge of the subject-matter. Multiple choice is regarde 

by most test experts as the best of the short-answer forms. 


MULTIPLE-RESPONSE ITEMS 
The multiple 


„ à . . r 
choice type of question. Essentially It presents a statement О 
topic followed b 


; amination 
may be checked as correct. The multiple-response examinatio. 


; and is useful in obtaining infor- 

- This examination form is often 

called a check list. The advantages and disadvantages, as well as 

rules for construction given above for multiple-choice, apply to 

multiple-response items as well. 
Two examples follow: 


Example: Under each of the following psychological doctrines, 


7 4 атт : şe 
viewpoints, or systems, indicate by a cross (x) thos 


implications or consequences which are characteristic 
of that doctrine,  ' 


1. English Associationism 

a persisting self 

—— ———sumination and integration of mental states 
universal Categories of reason 

mental faculties 

persisting motor. TCsponse systems 


М Purposive Psychology (McDougail) 
d imageless thought ў 


N 


motivation in terms of instincts 
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doctrine of the unconscious 
conative tendencies 
Each of these items can be described by more than one choice. 


MATCHING ITEMS 


In the matching test, бше list of words, names, phrases, for- 
mulas, or statements is to be matched against another list. The 
test may consist of (a) a list of names in one column to be 
matched against a list of achievements in a second column; 
(b) a list of terms to be matched against 2 list of definitions; 
(c) labels to be matched against charts and diagrams; (d) authors 
to be matched against books, dates and events. 

The matching item possesses the advantages of interest and 
variety as well as case of scoring. It is, furthermore, somewhat 
Càsier to construct than the multiple-choice item. Matching has 
been frequently used to test the relationship between dates, events 
and various facts. On the negative side, the matching item often: 
Measures recognition memory rather than understanding, and is 
especially open to clues. Nor do matching items ordinarily test 


ability to organize facts or to apply principles. 
Rules for making up matching test Items may be set down as 


follows: ‚ 

nclude too many items in the lists: 10 or 12 is the 
maximum, 5 or 7 often enough. When lists are long, examinees 
must spend too much time hunting through them. Have the 
number of items in thc column from which sclections are to be 
made larger than the number in the list to be matched. This 
lessens the chances that an exanunce will match an item correctly 


by a process of elimination. 
owing statements are representative of differ- 
» 


Example: The foll 
: ent schools of psychology. їп the blank spaces before 
the statements, write the number of the psychologist 


for whom the statement is typical. 


1. Do not i 


9 E 
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(1) Adler (7) McDougall 
(2) Angell (8) James Mill 
(3) Calkins (9) Pavlov 

(4) Freud (10) Titchener 
(5) Jung (11) Watson 

(6) Koehler (12) Woodworth 


— — Sensory processes have the attribute of clear- 
ness, just as they have quality and intensity. 
There is evidence for the existence of three 
types of native and unlearned emotional reac- 
tions—fear, rage and love. 
The inadequacy, the relative futility, of all 
' attempts. to ignore the purposive, the goal- 
seeking nature of behavior renders behavior- 
ism untenable, 4 
Апу mechanism, except perhaps some of the 
Most rudimentary that give the simple reflexes, 
Once it is aroused, is capable of furnishing its 
own drive and also lending drive to other con- 
necting mechanisms. 
The will-to-power is the great motive in men- 
tal conflict, J 
The superego Tepresents the repressions of in- 
stinct and dominates the ego. 
Mind is Primarily engaged in mediating be- 
tween environment and the needs of the or- 
ganism. 
———Sensations are one of the primary states of 
consciousness; ideas are the other, 


2. Select materials from one subject-field only, so that a given 
item in column 1 has several plausible matches in column 2. 
Explain clearly the basis of the matching, 


Example: In column 1 are words which 


^ 1 illustrate а number of 
parts of speech; in column 2 is 


a list of various parts 
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of speech. Determine what part of speech a word is and 
then identify it by putting its number before the 
proper item in column 2. For example, “boy” is 2 
noun, and if "boy" were numbered 5, a 5 would be 
placed before the word "noun" in column 2. Arrange 
the choices in alphabetical order. 


(1) and — —— adjective 
(2) cat — adverb 
(3) rapidly noun 

(4) jump preposition 
(5) from verb 

(6) rich 


(7) either $ : 
Match the items in column 1 with the appropriate 


Example: 
items in column 2. 


contagious disease 


A. Harvey 

B. stomach digests food 

C. poison __ discovered circulation of the 
D. Galen blood is, 

E. lungs ___ early Greek physician 

F. heart — — supplies oxygen to the blood 
С. measles 9 


The first example is quite easy. But it should enable a teacher to 
Spot grammatical confusions. All of the material is from the field 
of grammar. The second item is poor owing to heterogeneity in 
the list of choices (names and bodily organs). 
mes in alphabetical order, dates and numbers in 
r to save the examinee's time. 


Example: Select the inventor from the first list and put his num- 
b ite his invention in the second list. 


3. Arrange па 
sequence in orde 


osi 
j pn Atlantic cable 
(2). Edison cotton gin > 
electric starter 


(3) Field 
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(4). Franklin ———sewing machine 
(5) Howe steam engine 


(6) Kettering wireless telegraphy 
(7) Marconi 


(8) Watt 
(9) Whitney 


4. Avoid clues, for instance, one singular item in both lists, 
the others plural; one item in the list of a different part of speech 
from the.others. Watch for irrelevant (but revealing) associa- 
tions, such as nationality, which give away the matching—for 
example, if the examinec knows that a certain discovery was 
made by a Frenchman, he will look for a French name. 

The matching’ item is compact and usually interesting to stu- 
dents. It enables a teacher to cover a wide territory in fairly 
short time. Matching is well suited to rapid surveys of specific , 
aspects of a.field when persons, events, or definitions are wanted, 


or when these constitute necessary knowledge for further work 
in the subject. 


COMPLETION ITEMS 


In this test form, Sentences are presented from which certain 
Words or phrases have been omitted. Instructions are to fill in 
the blanks so as to complete the meaning. Completion requires 
recall primarily, but it also demands thought and the ability 
to perccive over-all relationships. Little Opportunity is afforded 
for guessing. The chief disadvantage of this test form lies in the 
scoring, which is not entirely objective 
ing, and in the fact that too many blanks confuse the examinec 
and make a puzzle out of the test. Completion has been a favorite 
of teachers in their own cxaminatior-making, although it is not 
so widely used today as multiple-choice and T-F items 

Rules for writing completion items Ө 
in such items are as follows: 


and is often time-consum- 


and errors to be avoided 


1. Do not copy sentences and Paragraphs directly from the 
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textbook, since this puts too much emphasis on rote memory and 
need learning. Rephrase the language of the text, if that is 
used. 

Example: Human behavior, more than that of any other animal, 
is a product of ....--- Lr 


Example: Much learning is by trial and 
The first is a poor item. It is out of the textbook and will bc 


known by those who recall the textbook language. The second 
is also a poor item—the pat expression “trial-and-error” gives it 


away. " 


„ 2. Too many blanks make it impossible for the exa 
get the meaning, especially if the sentence is short. 

Example: Civilized man .......- ; uncivilized man .-.--.--- A 
This item actually appeared on a printed test. It is impossible tó 
complete it, or else it can be completed in a wide variety of ways, 


most of them not indicative of much knowledge. 

ve if words rather than phrases are 
hose which carry the meaning of 
ry elements or the 


с 
тіпее 10 


jecti 
ords—t 
h—nor unnecessa 


3. Scoring is more ob 
deleted. Blank out key W 
the sentence or paragrap 
articles a, az, the. 


Examples: Democracy is that form of ...- 


which all of them v ste mM genis etes 
MN cu power through 
elected by «70 Des a 
Democracy jg that. Mes of gov- 
ernment in which ...:---- СА . of the 
T people MP Ld SM suns governing power, 
: representatives elected ....... 


^16 


is the better, since the blanks contain 


The first form of the item 
key words. The second version deletes connecting words which 
he meaning of the sentence. 


do not carry © 
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4. Make the blanks long enough to permit legible а. 
Have all blanks of standard length to avoid clues as to the lengt 
of the completing word, 


5. When there are several d 
ina Scoring key. Alternate answers may be weighted for gooe- 


‚ б. Guard against clues by taking care that completions do 


not depend upon (a) grammatical form, (b) pat or textbook 
expressions. 


Examples: Johnny wears his Space suit, even when he 
to bed. 


A much discussed 


question is the relative importance 


THE' ESSAY QUESTION 


The essay questibn has been a Standby of teachers over the 
years. It is widely used ir the "literary? Subjects, such as history 
and English, and in the natural and social Sciences as well. The 
purpose of the essay question is to elicit understanding, organiza- 
tion and interpretation, rather than to test for detached tidbits of 
knowledge. The form of the essay Question is important. Ques- 
tions beginning with “who, what, “when,” and “where” are 
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usually to be avoided when they ask simply for a name (for 
example, Napoleon), a date ( 1492), an event (Battle of Hastings) 
or a location (New York City). But such questions are valuable 
when the information asked for is relevant to the solution of a 
problem, the making of an inference, or the interpretation of 
some event. Questions beginning with “why,” "how," “with 
what consequences,” or “with what significance” are to be pre- 
ferred to simple fact questions. Questions beginning with such 
words as “discuss,” “evaluate,” “outline” and “explain” invite— 
and usually get—a mass of detail, some not relevant. Such ques- 
tions are useful, of course, when we wish to know how well 
an advanced student can select, reject and organize. But they 

„ are hard to score and are virtually useless in a broad survey or 


for the diagnosis of specific blindspots. 


Restricting the Essay Question \ 
е when cast into short- 


The essay question becomes objectiv 
answer form and restricted in coverage. Two methods of con- 
trolling the essay question and rendering it more specific may be 
mentioned. Д í 
Recall Questions. Recall items are essay questions reduced to 
the simplest terms. Usually a question is followed by a blank 
space, varying in length. Answers are restricted to short para- 


graphs, the account of some event, an algebraic equation and its 
d the like. Recall items resemble the completion 


application an i ‹ 
re but they provide for fairly fre& answers and are less 
А 
restricted. AE | З 
. (1) Define an invertebrate ..... se] CN. 
Examples: (1) 3 A YT Sy 


(2) Name three scientists who contributed (a) to 


atomic theory, ап 
tion of each. 


d (b) list the major contriby- . 


o 
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(3) List three conditions which must hold true if an 
intelligence test is to yield a constant IO. 


The first item calls for a one-line answer. Items (2) and (3) ask 
for Specific but basic knowledge, Compare (3) with the essay 
question, "Discuss the Construction of the Stanford-Binet.” 


Example: A skillful teacher has been characterized as one who 
(2) maintains a permissive atmosphere. 
(b) avoids negative discipline, 
(c) conforms to the wishes of parents. 
(d) does not use repetitive drill. 


Write оле paragraph defending or attacking each of 
these Propositions—that is, four paragraphs in all. 


Example: A recurring problem in child development is that of 
maturation. Cite the evidence bearing on the problem 
from the following points of view: 

(a) neurological 
(b) co-twin control 
(c) parallel groups 
Scoring the Essay Quéstion Objectively 
Perhaps the major weakness of the essay examination lies in 
the unreliability of its scoring. Scoring tan be made more objec- 
tive by the use of the following techniques: 


1. When €ssay examinations are marked an 


; °nymously, there 
is usually better agreement between different scone 
2. There is less opportunity for Preference 


es, attitudes, and 


Scoring the Essay Question Objectively 207 


biases to appear when all papers are read for one question at a 
time rather than each paper straight through. Obviously, com- 


parisons can be sharper with this method. 
the teacher can list the basic facts 


which the question is intended to bring out. Points may then 
be assigned to these aspects of the answer. For example, if the 
question deals with a chemical process, the answer list may in- 
clude (a) the necessary equations of the process, (b) the chemical 
elements needed, (c) a diagram of the apparatus, and (d) any 
by-products of the chemical reaction. If the question deals with 
English literature, the answer máy include (a) the author’s chief 
contribution, (b) the cultural setting of the time, (c) the influ- 
ence of the author’s work. A check list of key points, with credits 
assigned to each, is a useful technique. Thus, from one to three 
points may be assigned to each part-answer. 

rs for spelling, writing quality, 
Il as for content and organiza- 


3. Before reading a question, 


4. If the teacher marks the pape 


and grammatical expression, as We 2 
tion, credits should be allotted to these aspects 0 the answer. 


The essay question is а valuable examination form M held 
to one or more defined themes, so that it 1s scorab e. Many 
teachers are so impressed by the general use of objective-type 
items in the standard tests that they are inclined to drop the 
essay entirely. This is a mistake. Many courses, especially ad- 
vanced courses, in literature and in science employ objective-test 
items as a first approach to an examination ЕТ the sub eee Ber 
the essay question is the best (perhaps the only) way in Sae 
а teacher can determine whether a student can organize his 
knowledge and arrange his arguments п logical fashion. Short- 
answer forms should be regarded not, as substitutes for the essay, 


but rather as supplementary to it. 


SUGGESTIONS FOR FURTHER READING 


А 3 TR Specimen Objective Test Items: A Guide to Achieve- 
Gerberich, J ruction. New York: Longmans, Green, 1956. 


ment Test Const 


a н т 
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Remmers, Н. H., Ryden, F. B Morgan, C. L. Introduction to Educa- 
tional Psycbology. New York: Harper, 1954. 

Ross, C. C. and Stanley, J. C. Measuremen: in Today’s Schools. New 
York: Prentice-Hall, 1954, 


Travers, R. M. How to Make Achievement Tests. New York: Odyssey 
Press, 1950, 


Wrightstorie, J. W., Justman, J., Robbins, 1. Evaluation in Modern 
Education. New York: American Book, 1956. 


В А it 
Note: Most textbooks on educational psychology and in measuremen 
and evaluation contain chapters dealing with objective items. 


QUESTIONS AND PROBLEMS 


1. Write five true-false items in sorne Subject field familiar to you. 


2. Rewrite the five items in nur 


er 2 in completion form. б 

5. Rewrite the following essay questions to make them more objective 

in answering and in scoring. 
a. Discuss some of the proposals for aiding the gifted child. (Hint: 


break down into specific proposals, such as Special classes, accelerated 
promotions, extra assignments, and the like.) 
b. Evaluate three of the modern learnin 
may be subdivided under descriptive labels—b, 
under names of well-known theorists.) 
c. Discuss the causes of the industrial revolution. 
6. Point out any errors én the following items: 


1. The Frenchman who developed the first successful intelligence 
test was (1) Kuhlmann (2) Terman (3) Binet (4) Wundt 

2. An efficient man is one who is (1) Strong (2) handsome 

. G) angry. (4) pusillanimous (5) Capable 

3. T F Edgar Anderson Poe Wrote the 


Пе poem © „ЖЕ 
4. T F Lack of emphasis-on the three Re em “The Raven. 


"s 


is : ; 
modern educational Practice, ПОГ a serious defect in 
5. T F Strict application of the Golden Rule will make for better 
: living. 


= i distributio, a 
6. T F The median of a П ОЁ scores idpoi 
» which is influenced markedly py very hig x ned 
Scores. 24 


. The best way to spend one's leisure time is 


- The expansion o 
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criterion of 
to (1) read good 


books, (2) look at TV, (3) dig in the garden, (4) play solitaire, 


(5) relax in an easy chair. 
f the binomial (a +b)? 15...65 


1. off " 


12. I borrowed a book 2. off of my roommate. (Answer)....... 
3. from " 
13. We get the most calories per pound from 
(1) candy (4) potatoes : 
(2) carbohydrates (5) proteins 
(3) vitamins 
14. When there is a fire drill, the teacher must make sure thar her 
MN еы» эрле, USERS observe 
aa ena Es and 5, A РС ee 
15. T F The work of Freud has done much to demonstrate tl 
once formed are never lost, even 


though under conditions 
ally be recalled. ^ 


CHAPTER 9 


CONSTRUCTING THE OBJECTIVE TEST 


in many schools. 
€r able to interpret 
Ог, Says about his test when 


j were selected and put to- 
gether. Even more Important, perhaps, the teacher who knows 


© to improve greatly the 
quality of the day-to-day tests which he makes for his own use. 
The construction of a comprehensive batte 


ә REN oe Ty of educational 
achievement tests is not a task PPropriate fcr most teachers and 


c 
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deae С. printed ih in wide use today are made 
nding ah S ue hese agencies have a staff of experts in item 
ction techniques, technically trained assist- 
ants, and access to large and representative samples and to labora- 
tory and scoring equipment. The classroom teacher can hardly 
hope to match all this. And fortunately it isn’t necessary, since 
his test-making is properly on a much more modest scale. 
This chapter will outline the basic techniques in test con- 
struction. These methods apply whether the test is designed to 
Measure intelligence, educational achievement, or aptitude. 


WRITING SPECIFICATIONS FOR THE TEST 


Before he begins to construct an,examination, the teacher must 
decide what he wants his test to do. This means that he must lay 
down specifications for the test (page 105). Usually a teacher 
Wants to test his students’ knowledge of the fundamentals of the 
subject and to sce how well they can use this knowledge in 


solving problems. Three subject matter tests in different areas’ 
pecifications the 


will be described in order to show what s| 
author had in mind and how he went about accomplishing his 
objectives. 


Columbia Research Bureau Spanish Test* 
s and colleges. Part I calls 


This test is designed for high school 
for basic knowledge of the language, and Parts II and III require 


understanding of language structure and application of rules of 

grammar. In more detail, Part I is a vocabulary test of one hun- 

dred words in multiple-choice form. Thé, student is instructed 

to mark that one of four or five English words which best de- 
ark the k 


fines the given Spanish wor | . | 
test, There are seventy-five sentences 1n Spanish arranged in 
onde f difficulty; each is to be read and marked “True” or 
er o y 
“False.” Part III is concerned with grammar and syntax. This 
* Published by the World Book Company, Yonkers, N. Y. JA 
% . 3 л 


d. Part П is 2 language comprehension 


2 
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test consists of one hundred English sentences, each followed 
by an incomplete translation in Spanish, which the examinee is 
told to complete. 


California Arithmetic Test (Upper Primary, 
Grades 3 and 4)* 


This test is part of a comprehensive educational achievement 
battery, but it may be given as a separate examination. Its objec- 
tive is to test for skills in fundamental operations, the identifica- 
tion of consistently made errors, and the ability to apply what is 
known to the solution of problems. The eight sub-tests cover the 
four fundamental Processes (addition, subtraction, multiplica- 
tion, and division), facility and skill in following directions in- 
volving numbers, and simple “mental arithmetic” problems. 


The Nelson-Denny Reading Test** 


The authors state the objectives of this test as follows: to pre- 
dict success in college, to enable a sectioning of college and 
high-school classes on the basis of reading skills, to aid in the 
diagnosis of scholastic difficulties. The examination consists of 
Two parts, a test of vocabulary and a test of the ability to read 
and understand fairly difficult prose. There are one hundred 
words in the vocabulary test, each word followed by five choices, 
one of which is to be marked as correct. The paragraph-reading 
test is made up of nine selections of approximately two hundred 
words each. Four questions are asked on each paragraph. There 
are five optional answers for each question, one of which is to 
be selected by the examinee. It seems clear that the test measures 
basic knowledge of language as well as the ability to use this 
knowledge intelligently. : 


SELECTING ITEMS FOR THE TEST ^ 
In the construction of an examination, bot 
the form of the question must be considered, 


* Published by the California Test Bureau, Los An des. бап 
EA by the Houghton Mifflin Co., Boston, Mas alif. 


h the content and 
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Deciding On the Type of Нет 

The teacher must first decide what type of objective item 
he wishes to use. True-False and multiple choice are favorites 
for measuring basic knowledge, and multiple choice, matching, 
completion and essay recall are all used to assess understanding, 
interpretation and application. It is probably less confusing for 
the younger students if a sub-test or section contains only items 
of one type and does not switch from one kind to another. The: 
test-maker should start with a much larger number of items 
than he plans to have in the completed test. АП the questions 
should be read by other teachers of the subject and criticized for 
form and for content. Items judged to be trivial, inappropriate, 
ambiguous, or too narrow in scope should be revised or dis- 
carded. The items which survivé this preliminary inspection 
should still number considerably more than the number of items 


planned for final use. An excess of items is necessary, since some 
items will always be discarded as a result of the item-analysis to 


follow. 


Arranging the Items in Order of Difficulty 


The questions are now arranged in a rough RU of amay: 
from easy to hard. For the first try-out, E: add re 
item as judged by several teachers is sufficient Wi MER Е H 
test, as tentatively drawn ир, 1S NOW апше f T p 
students for whom the final test is intended—for von » to 
fifth-grade pupils or high-school freshmen. If nac e B 
of the subject co-operate—and thus increase the size of the 
experimental group—the final test will be a better examination 
than it will be if it is administered to a single oe It 5 
always advisable to get 25 much information as possible on each 
i those examinees who take the examination in pre- 
геш, Henco hould be urged to attempt every item, even when 
liminary foes of the ‘answer. The time allowance for the 
they are E n be generous, SO that every student will have 
Mix с, item. This may make it necessary to have a^ 


second testing period. xd 


a 
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Setting the Time Limit 


The length of the time interval set for the test when put into 
final form will depend on the time available for testing—most 
often one period of about fifty minutes. Time allowances must 
always take into consideration the age of the pupils, type of 
item (amount of computation or reading needed in answcring it), 
whether the test is primarily for survey purposes or for diag- 
nosis, and whether speed and/or power are deemed important. 
In examinations which are strictly power tests, the time limits 
should be long enough for all but the very slowest examinces to 
finish. Sometimes, naturally, an examination has to be cut in 
length in order tovhave it fit into the available time. 


ITEM ANALYSIS 


The two characteristics of an item which we need to know 
about in building a test are. (a) difficulty and (b) validity, or 
discriminative power. These two dcterminants of an item's good- 
ness are computed from the same tabulation of the test data. Com- 


putation of the difficulty and validity of an item is called item 
analysis. 


Difficulty and Validity in Нет Analysis 


The difficulty of an item depends on how many of the exam- 
inecs in the tryout group answer it correctly. An item answered 
correctly by 90 per cent of the group is obviously easier than one 
answered correctly by 50 per;éent or by 10 per cent—the last 
being a hard item. Very hard and very easy items are ordinarily 
less useful than items ofsintermediate difficulty (page 216). The 
validity or discriminate power of an item depends on how well 
it distinguishes between the brightest and dullest pupils in the 
group. If all of the members of the experimental group answer 
an item correctly—or if none does—the item has no validity, 
since jn neither case does it separate the good from the poor 
members of the class. 
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Biserial r in Item Validity* 


The authors of most standard mental tests have used the biserial 
r method (or some approximation to it) in determining the 
validity of the items in their tests. By means of biserial 7, we can 
compute the correlation between success and failure on a single 
item and size of total score on the test, or on some other measure 
of performance taken as the criterion. The size of the correlation 
between item and test score shows how well the item is working 
together with other items—is a member of a team. Items unrelated 


to total score are discarded. 
Steps in the determination o 
are as follows: 
1. Arrange the test papers in order 
to lowest. 


f item validity by use of biserial 7 
for total score from highest. 


2. Count off the highest and lowest 27 per cent** of che papérs 
—if not exactly, as nearly so as possible. If there are 120 children 
in the “standardizing group,” for example, put 32 in the top and 
32 in the bottom groups. 

he number, in the high 
a (OM m pass each гет, and express these figures as 
percentages. Suppose, for example, that Item 18 is passed by 60 
per cent of the high group and by 30 per cent of the low group. 
Then from tables prepared for the purpose,t we read that the 


biserial correlation between this item and the whole test is .31. 
nds: d by 24 per cent of thc high group and by only 


w group, the biserial r is .44. In general, any 
1 


group and the number in 


3 per cent of the lo 


i iserial r, see references at the end of Chapter 204 

* For the computation oL for choosing 27 per, cent. When the А of 

Rê THE Bor sharpest discrimination between extreme groups is obtained 
ability is normal, Seis based upon the highest and lowest 27 per cent in each 
when item analyss сг cents are in the high and low groups, the reliability of 
case. When larger р higher, but the difference between the two groaps de- 


olî 1 C Я 
the determina ө other hand, when per cents in the high and low groups are 


bility falls off but the difference between the two groups increases. 
is Table by Chung-Tch Fan, published by the 
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item with a biserial r of .20 or more can be taken to be valid if 
the test is fairly long. In a short test, items of higher validity are 
needed. Both hard items and easy items are valid (that is, have 
discriminative power) if they separate the high and low groups. 
Ап item passed by 15 per cent of the high group and only 1 per 
cent of the low group (a very hard item), for example, has a 
biserial r of .47, whereas an item passed by 92 per cent of the 
high group and 65 per cent of the low group (an easy item) has 


a biserial ^ of .39. Both are good items, though they differ greatly 
in difficulty. 


4. Determine the difficulty of each item by averaging the per- 
centages that pass it in the high end low groups. An item passed 
by 60 per cent of the high group and by 30 per cent of the 


low group, for example, has a difficulty index of .45—that is, | 
(2 + .30 


s: and an item passed by 15 per cent of the high and 


І per cent of the low groups has a difficulty index of .08. This 
summary method of obtaining difficulties of items is not as 
accurate as is the practice of using the whole group, but it saves 
time and is precise enough for most tests. 


5. It can be shown mathematically’ that items with difficulty 
indices of .50 or thereabouts are the best items, in the sense of 
being able to differentiate among the largest number of good and 
poor students. Not many items, of course, will be found with 
difficulty indices of exactly .50; the range of difficulties usually 
runs from above .9) to below .10. If the test is to cover a wide 
range of talent (and that js what is wanted in most school examina- 
tions), a good plan to follow in sclecting items is as follows: 


Of items passed by 85-100 per cent (very easy) 
take about 


М е 15 per cent 
Of items passed by 50- 85 per cent (fairly easy) 
take about 7 и 
Of items passed by 15- 50 per cent (fairly hard) 
take about E 


35 per cent ` 


| 


'answers are illogical, obvio 
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Of items passed by 0- 15 per cent (hard to very hard) 
take about 15 per cent 


АП of the items should, of course, have satisfactory discrimina- 


' tive power. Note that different proportions at the difficulty levels 


follow the normal distribution. 
Items passed by 100 per cent or by nobody have no validity 


in either case, but sometimes an author will place several very 
easy items at the beginning of the test for psychological effect, 
and a few very hard items at the end to test the very bright pupils. 


6. In using multiple-choice items, it is important for the im- 


provement of the examination to; know to what extent good and 
poor students have chosen the various distractors. If the wrong 
usly absurd, or otherwise not very 

misleading, the examinee will have little difficulty selecting the 
right option. The item is easier than it would have been had the 
the misleads been more attractive. Information concerning the 
efficacy of misleads can be obtained by tallying the responses of 
the high and low groups to each mislead, as shown below. The 
group considered is the 120 children referred to in the illustra- 
tion above, and there are 32, (27 per cent) in the high and 32 in 
d of the multiple-choice type with 


5 is 
the low groups. The item ! : 
four к ок and the correct answer is keyed as (b). 


® c d Omissions Total 


2 
Gee 26 А 16 2 Mv 0 32 
igh group 3 7 10 12 0 32 

Low group i í 
eeds to be rewritten, since only 


; n 
; hat distractor (a) à Е 
It is clear t 26 differentiates F2- 


f in sixty-four chose it- Otherwise, item 
uz d poor students rather well. 


H . 
ve dnd example shows a slightly different situation. Here (c) 
ec а 
is Eo as the correct answer. 
10 a b © d Omissions Total 
T zh group 0 3 15 | 11 5 1 EE 
Es. group 5 10 9 8 7 б 
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Mislead (b) is chosen by more of the good students than is the 
Correct answer (c); and this is true, too, of the poor students. 
Obviously, mislead (b) must be made less attractive or otherwise 
changed so that it doesn’t compete so strongly with (c). Further- 
more, (c) might be strengthened and (a) examined further to 
see why it failed to attract any answers in the high group. 


FIGURE 9-1 Item Analysis Data for Test File 


FRONT OF CARD 


ltem 36: What marked change took 


place in the political status of 
India in the year 1947? 


- She received a mandate from the United Nations. 
- She won her independence from Britain. 

. Her people were united under Mohammedan rule. 
. She joined the Arab league. 


BACK OF CARD 


Item 36: 1 2 3 Omissions 
High group: 10 32 6 P о 
low group: 19 n 13 2 


Sample: 200 high school seniors, tested in June, 1953 


Validity: biserial r = .41 
Difficulty: — 39 per cent 


7. Many teachers find it profitable to keep a file of items for 
future use. A good plan is to write the item on one side of a card. 
On the back of the card should be listed (1) the size ала char- 
acter of the experimental group on which the data are based, (2) 
the validity of the item (its biserial r with the test Score), (3) the 
difficulty of the item, and (4) data on misleads, Figure 9-1 shows 
these data on an item taken from a test in contemporary history. 

Wher. a teacher has accumulated a large file of items, tests of 


° 
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approximately the same range and validity may be made up as 
needed. А 


А SHORT METHOD ОЕ ITEM ANALYSIS 


It is wise for a teacher to understand what the biserial co- 
efficient of correlation means and what it does, since the device 
is utilized in many standard tests and is frequently mentioned in 
the literature of resting. At the same time, it is not necessary for 
the teacher to employ the method in order to construct good 
classroom examinations. The difference between a simple count 
of “rights” in selected fractions of the best and poorest pupils 
will suffice as a measure of the validity or discriminative power 


of an item. First, the items should be gone over Бу "several 
items discarded, and the remaining 


. teachers, the unsatisfactory ; | 1 
this determined by the judg- 


items arranged in order of difficulty, 
he items. Next, the test as tenta- 


ment of the teachers reviewing t 1 
tively drawn up is administered to a sample of children drawn 

sels to be tested. From here on the 
from the classes or age lev els t 


steps are as follows: 
rs in order for size of total score, from 


1, Arrange the test pape 
the highest to the lowest. < M 
25 per cent* of the best papers and the 25 per 
ic E "n Vl S If the total group is small (for ex- 
ample, under fifty) take some larger proportion, say the upper 
half and the lower half. Suppose there are eighty pupils in the 
experimental sample (try-out group), so that twenty, or 25 per 
хр 1 in the high group and twenty in the low group. Each 
cent, fal ad be examined to see whether it is able to separate 
item may 


these two criterion groups. » Ф 
` Gp; h of the two criterion groups 
:ne the number in eac 
3. Determine 


r each item correctly. If fifteen in the high group 
swe 


who an: j й Y s Е 
-erial r method is used in determining validitics, there ıs no 
* Unless the pie somewhat unwieldly 27 per cent rule; 25 per cent or any 


need to observe t percentage will serve. — 


convenient larger 
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, answer an item correctly, and five in the low group get the item 
right, the validity is 15-5, or 10, and the validity index is 10/20 
or .50.* If all twenty in the high group answer an item cor- 
rectly, and none of the low group gets it right, the validity of 
the item is maximal: 20 — 0 — 20, and the validity index is 20/ 20 
or 1.00. The lowest validity index of an item by this method is, 
of course, 0/20 or .00. Validity indices run, therefore, from 0 to 
1. There may be a few items of negative validity: more rights in 
the low than in the:high group—but such items are rare. Items 
having zero or negative validity must be rewritten before they 
are used or discarded if salvage is impossible. 


4. If Къ = number right in the high group and Rz = number 
right ir the low group, the discriminative power of an item 1s 
simply (Ra — Rz) or (Ru — Rz)/Nmx when written as a validity . 
index. Using the same nomenclature, we may write the difficulty 
index of an item as (Ru + Кі) / (№ + Nz) in which Ni and № 
are the numbers in the high and low groups, respectively. In our 
example above wherein Ra = 15 and Rr = 5, the validity 
index is 10/20, or .50, and the difficulty index is (15 ni 
5)/(20 + 20), or .50. If Ra = 18 and Ry = 12, the validity 
index is 6/20, or 30, and the difficulty index is 30/40, or .75. 
Again, if Ra = 10 and Rz = 2, the validity index is 8/20, or 
:40, and the difficulty index is 12/40, or .30. 

5. Select the items having the highest validity indices for the 
final test. Then follow the table on Page 216 in apportioning . 
items to the various levels of difficulty, if the test is to cover a 
fairly wide range of talent. 


6. It is advisable to examine the misleads when multiple-choice 
items are to be used. The method outlined on Page 217 will aid 
in locating distractors which are too plausible o 
enough. The first kind аге. too often accepted, 
are taken by only a few. 


= Е 
ПКЕЕ fro he difference p, 
'* Validities can be left simply as tl etween the number right 
in the two extreme groups. The Eni Sree ea validity index is to But 
validities in a percentage scale, as are the күпе: 


r not, plausible 
and the second 


X 
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7. A card file of acceptable items will prove useful when a 
teacher wants to lengthen a test or to replace non-functioning 
items. When there are a number of items, a parallel form of the 
test can be drawn up. 
Table 9-1 shows the sort of data which we can expect to get in 


an item analysis of questions administered to a sample of 80, as 
would contain data 


described above. The full table, of course, 
f the two criterion 


on all the items and on all the members o 
high and in the low groups are 


groups. Half of the scores in the 
not shown in order to shorten the table, but these omitted scores 
are included in the totals upon which the item analysis is based. 
Each of the two criterion groups (the high and the low) consists 
of twenty examinees. 4 wn Ф. 
Examination of Table 9-1 shows that Item 4 is highly valid and 
that Items 1, 2, and 5 are acceptable. Item 3 has no. validity and 
must be dropped or changed drastically. An item with a validity 
index of .20 or more may be considered satisfactory—at least 
tentatively. This figure is arbitrary, however. If the test is 
shortened, the acceptable point for a validity index should be 
raised; if the test is lengthened, it should ud ipe e 
i i r than 0 has some уа1с1 a e some 
ira а Table,9-1 the difficulty indices range from 
70 © fairly easy item) to .30.(a fairly hard item). 


SCORING THE COMPLETED TEST 
-F form, the point scores will 


d test is cast in T 
If the ошын р d right, or R — W if we wish to correct for 


зеен (pags 191). In multiple-choice. tests, the correction for 
guessing 1S WW 

Score = R =т=) 

oices or options. It is sometimes ad- 
whet Жай the correction for guessing with T-F items, but 
visable to 8° к correction is satisfactory in multiple-choice 
ovided. а 


^ 


|a die number of ch 


TABLE 9-1 


Item Analysis of the First Five Items of a Test Made Upon Two 
Criterion Groups, the Highest and the Lowest 25 Per Cent 
in Total Score. № = Ny = 20, and N — 80 


Highest Group Total test score 
(Best 25 per cent) in order ITEMS 
In order of Merit of size 1 2 3 4 5 
1 72 v v v v Vi 
2 70 0 v 0 v 0 
3 68 v У v v v 
4 65 v 0 0 v 0 
5 65 ۷ v ۷ ۷ M 
6 65 ۷ M ۷ ۷ У 
7 63 У 0 0 v 0 
8 61 " v v ۷ v 
9 60 у у 0 у 0 
10 60 ۷ ۷ ۷ ۷ ۷ 
А 20 54 0 v 0 V 
Bs 15 16 10 20 8 
Lowest Group 
(Poorest 25 per cerit) 
1 35 у Жж 3s 0 
2 34 0 у v 0 0 
3 30 v v v v 0 
4 30 » У 0 0 0 0 
5 27 0 0 v 0 0 
6 25 M 0 v M M 
Я 25 0 0 0 M M 
8 24 y 0 / v 0 
9 23 У У 0 0 v 
10 23 0 0 0 v v 
20 ee . UL y 0 0 
RQ- ME WE E E. 
Ra R, = Dc cn 
(Ry — Ку) / Ny- Е f, 


60 . 20 
(Ra + Rz) / (Na+ Nz) = РЕ СО эло ' 


> 


| 
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In most cases, it is sufficient for the teacher to express standing 
on the test in point scores or totals. If several classes have been 
tested and it is desirable to compare their performance, percentile 
ranks will be useful. Scaling of teacher-made tests in standard 
scores or normalized scores is not recommended unless the test 


is to be used throughout a school system. 
Directions for the final test should be explicit, and time limits 
should be given. Manuals for standardized tests may be consulted 
with profit for pointers on, directions and time limits. A test 
should not be so long that most students cannot finish in the time 


allowed. 
The use of scoring stencils will speed up marking when many 
papers are to be examined. In TF _tests, a strip containing the 
answers (a key) may be laid alongside the left-hand margin an 
“the answers checked as right or: wrong, Or simply „the right 
answers checked. Separate scoring sheets are useful in aie 
With multiple-choice and matching items. Spaces are numbere 
on the answer sheet for recording answers to the questions on the 
test. The test blank itself is not marked and may be used more 


than once. 


THE RELI ABILITY OF THE COMPLETED TEST 
NT aed est method of estimating the reliability of a 
tp de eet Pas rely mere shan one for 8 
Wd SUE КАШ “split-halt technique. In this procedure, 
y what 1s calle’ tered only once to a sample of examinees, and 
un гезе: азаи two half-tests. The first half-test contains 
S then divided ci items (1, 3, 5, and so on) and the second half- 
the odd-number bered items (2, 4, 6, and so on).* The correla- 
test tis par nie: on the two half-tests is now found and 
Чоп рари V оа of the whole test with itself (its self- 
from this 7 the <° ted by the well-known “prophecy 


5 is predic 
correlation) 1$ P 
a test is split into odd and even items, the range of diffi- 
* * Note that ез is the same апа the split is unique. Not just any split 
culty in the ТҮ as js satisfactory. 


into two half-tes 
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formula.* To illustrate, suppose that in a class of ten seventh 
graders, an English Literature test in multiple-choice form has 
an odd-even correlation of .50. What is the probable self- 
correlation of the whole test? The prophecy formula is 


2 X r (half-test) 
1 + r (half-test) 


Substituting r = .50 for the self-correlation of the half-test, we 
have that 


r (whole test) = 


2X .50 or.67 
1+ .50 


This is a satisfactory reliability coefficient (.67) for a single class. 
For standardized tests administered to very large groups of a wide 
range of talent, reliability coefficients will ordinarily be higher— 
:90 or more. For teacher-made tests, however, the reliability co- 
efficients will rarely be more than .60 to .70. Reliability is higher 
‘over several grades—that is, when the test is given to more than 
one grade. The standard error of a test score can be computed by 
the formula given on page 29, but for the teacher-made test 
this is often a needless refinement. 

Reliability coefficients for a teacher-made test should always 
be computed from a new class, nezer from the sample used in 
determining the validities of the test items. Self-correlation in 
the standardization group will always be spuriously high, because 
the selection of items was based on the scores of the high and 
low members of the sample. 


r (whole test) = 


VALIDITY OF THE COMPLETED TEST 


A teacher-made, test in physics or French, for example, will 
always have content validity, even when the sampling is quite 
narrow. Teacher-made tests rarely Cover as much material as do- 
the standard printed tests. An approximate measure of validitv 
for.a test can be found by correlating test scores against school 


* The Spearman-Brown prophecy formula is treated in 


Е all standard texts 
dealing with statistical method in psychology and education. E 
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grades in the same subject. This method is not entirely satis- 
factory, since school marks are rarely more dependable measures 
of the subject matter than are the tests. When experimental 
validity is attempted by correlating scores on a teacher-made test 
with grades or with other test scores, a new group must always 
be utilized. Such validation, called cross validation, is necessary 
because the group used in item analysis is a special group which 
has served to select the items in the first place. Cross validation is 
e two criterion groups (the upper and 
lower extreme groups) are selected on the basis of school grades. 
A teacher-made test will of necessity correlate with grades 


achieved by this group, since the group selected the items. 
he value of a teacher-mode test 


Perhaps the best way to judge t 1 - : 

„15 by its predictive validity. If the test aids the. teacher in getting 
a better notion of the individual differences within the class, and 
leads to better understanding of the difficulties of the students 
(meager knowledge, wrong knowledge, and so on), it has ful- 


filled its purpose. 
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1. Assume 8 ae A 
Draw UP E ЕЕ J in 

Jows, construct a test using your class as standardizing sam- 

оїсе items in arithmetic and vocabulary taken :гогл 

р dike’s The Measurement of Intelligence may be used con- 

E. L. Thor orndike's book gives items by levels over a wide range of 


veniently* 


necessary also when th 
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GESTIONS FOR LABORATORY WORK 


u have tried out 50 T-F items on a class of 40 pupils. 
hat of Table 9-1 showing how you would carry 
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е В he 
difficulty. Administer a test of about fifty items and item-analyze th 
results by the method given on pages 219-223. : 
3. Take a test which has been given to this class or to some mee 
Analyze the questions for validity, following the method on pag 


QUESTIONS FOR DISCUSSION 


4 И 
1. A sixth-grade teacher has administered a test in Саап ад 
arithmetic. What analyses of the test дага could this teacher Low upils? 
would (a) help his future teaching, (b) be of value to indivi ee ed 
2. Under what conditions would it be profitable to correct 
а multiple-choice test for guessing? d ATE i iven 
3. Ye schools, о makes all the Баа procedures 
subject. What аге the advantages and disadvantages се zero validity? 
^. What might an item of negagive validity mean: . 
a 


CHAPTER 10 


SOME PROBLEMS IN THE EVALUATION 
ОЕ TEST SCORES 


Interpreting Multiple Aptitude Test Scores 
Table 10-1 gives the scores achieved by ten ninth graders o 
the Differential Aptitude Tests (DAT)..Scores on any MS 
test arc more meaningful when supplemented by she ae ae 
school grades and by а knowledge of personality traits, nee 
and ambitions. With this proviso in mind, it will be eens te 
to answer the questions beiow with references only to the ie 
centile ranks in ‘able 10-1. j per 
QUESTIONS ON TABLE 10-1 
tudents show n 
h сү S e the poorest scholastic ability? In 
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2. Which student exhibits the most consistently high level of 
ability? 

3. Which students are likely to have reading difficulties? 

4. Which girl should do well in secretarial work? 

5. 1f Joe Kramer wants to go to college, would you encourage 
him to plan to go into engineering? 

6. Would you encourage Larry Edwards to go into his father's 
accountancy firm after graduation from high school? 

7. Jane Goodrich plans to become a medical technician. Would 
you recommend this vocational goal? 

8. Which students will probably find it hard to graduate from 
high school? Е 

9. Frank Seay's father is ап auto mechanic and Frank is inter- 
ested in this work. Do you think it a wise vocational choice? 

10. Is it likely that several students are handicapped by poor 
spelling and language usage? Why? 


Case Studies in Evaluation of Abilities 
The three case studies svhich follow provide considerable data 

about three pupils, two in high school and one in elementary 

school. Questions are planned to focus upon things to look for in 

evaluating the promise of the pupils being considered. 

I. Case Study of Robert T. 

Robert is 16-2, a sophomore in high school. He is well-grown 
and makes a good appearance. He is well behaved, quiet, inter- 
ested, though not as a participant, in sports, and does not read 
much. Robert’s father is a house painter; both parents are high- 
school graduates. Robert wants to go to coilege, and is encour- 
aged to do so by his parents. He wants to be an engineer. 


School Dais З 
Nintb Grade Tentb Grade (First term) 
English C English C 
Social Studies B Social Studies D 
Mathematics B Physics B 
General Science B * French D 
C Physical Education. C 


Physical Educatioa 


e^ 
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Test Data 
Otis Quick Scoring (Form Gamma) ІО 112 
California Mental Maturity (Language) IQ 110 
California Mental Maturity (Non-language) ІО 121 
Cooperative General Achievement Test: Percentile Ranks 
I. Social Studies 38 
II. Natural Science 52 
III. Mathematics 36 
Kuder Preference Record (Vocational) -Percentile Ranks 
Mechanical 63 
Computational 51 
Persuasive 15 
Artistic 12 
Literary 46 
Musical 51 
Social Service 20 
Clerical 50 
Scientific 72 


1. What do you think of Robert's chances of succeeding in 
college? 
Я ; 
2. Robert's interests are in the mathematics-science area; аге 
they strong enough for him to plan engineering as a vocation? 
> А E # А H 
3. Robert s school record is weak in English and Social Studies, 
and his interests do not lie in persuasive and artistic ficlds. What 
occupations would you encourage him пог to enter? 


4. Is the variation їп Robert's ТО? too great to arise from 
chance? 


5. How do yóu interpret the difference between Robert's 
language and non-langzage 1Q’s? 
6. Would you sa 
ing with his IQ? 
7. Do you think Robert might be more successful as a tech- 
nician than as an engineer? 


8. Would you recommend that Robert become a s 


y that Robert’s school grades аге not in keep- 


alesman? 


a 
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9. Do Robert's interests jibe with his achievement test records? 
With his school marks? 
10. Might Robert do well as an airplane pilot? 


II. Case Study of William S. . 

William is 18-1, a senior in high school. He makes a good 
appearance, is husky and muscular. William is easy-going and 
affable; he likes to hunt and is interested in, and good at, sports. 
His father is a successful lawyer and his mother is a college 
graduate interested in club activities. The parents have planned 
for William to study medicine: his grandfather was a well- 
known physician in the community. William has accepted these 
vocational plans but says he is ixore interested in business and 
sales work. 


School Data 
Tenth Grade Eleventh Grade 
English B English С 
Social Studies B Social Studies B 
Mathematics D Mathematics D 
Physics С Spanish B 
Physical Education B Physical Education B 
Test Data 
Terman-McNemar Test of Mental Ability IQ 118 
California Achievement Tests (A dvanced) Percentile Ranks 
Reading 65 
Mathematics 40 
Language 6008 5 
Differential Aptitude Test ( Tenth Grade} Percentile Rank: 
: Verbal Reasoning , 86 
Numerical Ability 42 
Abstract Reasoning 38 
Space Relations 40 
Mechanical Reasoning 32 


Clerical Speed and Ассо гасу 55 


\ г 


е 
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Language Usage—Spelling 75 
Language Usage—Sentences 93 

Kuder Preference Record: (Vocational) ^ Percentile Ranks 

. Outdoor 96 
Computational 32 
Persuasive 90 
Artistic 40 
Literary 86 
Musical | 54 
Social Service 36 
Clerical 40 
Scientific А as 


1. Do you think that William is college material? 

2. Would you encourage him to plan for medicine as a career? 

3. Do William's grades verify his DAT scores? 
^ 4. Is language a strong area for William? Would you on the 
strength of this, suggest some other vocation than medicine for 
William? 1 so, what? 

5. What are William's strong interests, as revealed by the 
Kuder Record? 

6. Are William’s achievement test scores in line with his 
school grades? : 

7. Do you think William might be happier and more success- 
ful in business? Or in the study of law? Give reasons for your 
answers. : Е 

8. William's IQ does not jibe with his DAT scores. Can you 
givé any reasons why this should be so? 

. 9. The Kuder scores are more helpful than the DAT їп 
counseling William. Would you agree with this judgment? 

iQ, How would you explain to W'lliam's father his’consider- 
able variability in scores? And how would you explain the 
apparent contradictions? 2 


III. Case Study of Mary S: һ 
Магу 15 11-8, in the seconc half of :he sixth grade. She is 
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> pleasant and well mannered, but is judged by her teachers to be 
“nervous” and overanxious. Mary wants to be a teacher. Her 
father is an auto salesman, with high-school education; her 
mother is a housewife, with junior-college training. There are 
three other children in the family. 


School Data Я 
Fifth Grade · Sixtb Grade 
Reading C Reading B 
Social Studies G Social Studies c 
Arithmetic B Arithmetic C 
Science G t Science С 
Language C Language B 
Test Data 
Kuhlmann-Anderson Intelligence Tests IO 110 
Metropolitan Achievement Tests Grade Equivalents 
' Reading 6.1 
Vocabulary 6.8 
Arithmetic Reasoning Э 
Arithmetic Comprehension 52 
English s 6.2 
Spelling 6.6 
History 5.6 
Science 4.6 
1. In what subjects is Mary weakest? 
2. Would you encourage her to plan for teaching as a carter? 
3. Is Mary college material? Give reasons for your opinion? 
4. Could Mary do office and clerical work s 1 


\ uccessfully? 
5. Woald it help to have a Stanford-Binet IQ for Mary? Give 


reasons for your answer. 


Sociometric Testing 


From observations in school and Out, 


most teachers get a fair] 
good idea of the social and personal re t 4 


lations within their class- 


D 
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rooms. They soon come to know which children are leaders, . 
which are well liked and popular, which are disliked or ignored, 
and which are picked on and teased. It is sometimes valuable for 
2 teacher to have, in addition to his own opinion, some measure 
of the attitudes and feelings of the pupils regarding each other. 
When data cf this sort are collected systematically, they may be 
put into a table or expressed in the form of a sociogram. This 
last is a pictorial or graphic representation of the interpersonal 
relations within some specified group, often a class. 

The usual procedure is.to ask the pupils to designate the class- 
mate by whom they would rather sit, or the child (or children) 
with whom they would prefez to play ball at recess, or to make 
some other choice of a companion in a real life situation. Table 


J TABLE 10-2 
Sociometric Tabulation- 
CHOSEN 
David Anita Sally Gary Karl Janet Jack Helen Loura Ruth 
David “1 2 


Anita 1 2 


Sally А 1 2 
Сагу 1* 2 


Karl 1 2 


Janet ps 2 
Jack 2 1 
Helen 1 2 


CHOOSER 


Laura 1, d 2 
Ruth ' 2 Ao d 


ID M nonu A c C LL — ee 
Ist 
Choices 3 2 9 i 1 


2nd 
Choices 1 0 0 1 1 
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10-2 shows the responses made by ten fifth-grade pupils when 
' asked to nominate their first and second choices of a child to 
work with on a class project. (The table reproduces only part 
of the data for a class.) 

A first choice is shown by a 1 under the name of the child 
chosen, a second choice by a 2. An asterisk (*) denotes that the 


FIGURE 10-1 Sociogram for 21 Kindergartners, 13 Boys and 
a 8 Girls 


Group -Kindergarten 
Number- 21 
Jefi 


Strong (3) choice= ——*; Reciprocals = 4—3; Partial reciprocals : qy 


Northway, Mary L., and Weld, Lindsay, Soci i i 
EEE of the a ct Toronto ms Pm fio Testing: Reproduced 
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choice for first place was mutual. Thus, David chose Gary and 
was chosen by Gary, and Anita and Janet each named the other 
as first choice. The summary at the bottom of the table shows 
David to be the most chosen child, with three firsts and one 
second. Janet is the next most popular, with three firsts. Sally is 
chosen by no one, and three of the girls (Helen, Laura and Ruth) 
and one boy (Jack) receive no first choices. Tabulation of the 
responses as given by the children will provide the “choice” 
information that the teacher wants. 

A more striking method of representing the social relations 
within the group is afforded by the pictorial sociogram shown 10 
Figure 10-1: The responses were those of twenty-one Kinder- 
garten:children—thirteen boy? and eight girls. The stars (POP 
ular, often chosen children) are quickly located as are also the. 
isolates, wliom no one chooses. 'The two-headed arrows indicate 
mutual choices. 
^ When used wisely, а sociometric test can be helpful to а 
teacher, especially when the class is too large for close personal 


observation. Some of the things which a sociogram may reveal 
are the following: 


1. Good and bad personal relations, free interchange of 
choices, ог the existence of cliques. ' 


2. Clusters and cleavages resulting from differences in race 
religion, sex, and economic conditions of families. 


3. Differences between in-school and out-of-school social 
groupings. 


The sociometric method has some disadvantages and may do 
more harm than good if the morale of the class is low because of 
poor discipline, frequeat change of teachers, or other’ disrupting 
influences. For example, choices may be trivial or deliberately 
false, or some pupils may take the “test” as an occasion to express 
hostility and resentment against other pupils or against the 
teacher. Moreover, the choices of young children are often fleet- 


$ 
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ing, vary from time to time, and are quite unreliable. The socio- 
8 gram, therefore, is not foolproof. At the same time, in the hands 
of a skillful teacher, sociometric testing will often provide new 
insights into the personality traits of pupils and thus aid in 
discipline and in remedial work. 1 
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APPENDIX A 


STATISTICAL SUPPLEMENT 


In order to understand and use test results wisely, a teacher 
should be familiar with those statistical concepts most often em- 
ployed in mental testing. One of the best ways to accomplish 
this is to work through the computation of the basic statistics. In 
Chapter 2 a number of statistical terms were defined and their . 
application illustrated. In subsequent chapters these statistics have 
been frequently employed. If, when a statistic is first mentioned, 
the student will work through its derivation—for example, the 
tabulation of a frequency distribution or the computation of 
an r—the value of the statistic to mental testing will be clarified. 
A second or even a third review is often helpful. A good analogy 
here is the habit of looking up unfamiliar words in a dictionary. 
Sometimes а word must be looked up more than once before its 
meaning is clearly grasped. 


This Appendix deals with the following topics: 


The Frequency Distribution ' 

The Frequency Polygon and the Histogram 
Averages: Mean, Median, and Mode 

Measures of Variability: Range, О; and SD (с) 


s, The Coefficient of Correlation 
Drawing up а Frequency Distribution 


Test scores are more readily dealt with when they have first 
been organized into 2 frequency distribution. Suppose that Miss 
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Norton has administered a standard test to her class of forty 
pupils in social studies, and that scores are as follows: 


37, 38, 36, 31, 28, 33, 24, 19, 25, 34 
16, 43, 22, 20, 26, 44, 27, 19, 25, 34 
33, 24, 22, 20, 44, 27, 31, 28, 38, 17 
31, 26, 34, 17, 19, 20, 22, 24, 26, 29 


Table A-1 shows these forty scores tabulated into a frequency 
distribution in which the interval is five score units. Steps in 
setting up a frequency distribution follow: 


TABLE A-1 
Frequency Distribution of Forty Scores on a Social Studies Test. К 

Intervals Midpoinis Tallies А 
40 = 44 42 Ill 3 
35 - 39 37 Ill i 
30 - 34 32 THE IIL 8 
25 — 29 27 Ti tH 10 
20 - 24 22 ШП ? 
15 - 19 17 WI 6 

40 


(1) Determine the range, or the 
and lowest scores. Examining our set o 
range to be from 44 to 16, or 28. 

(2) Select an interval «which will be Convenient for tabula- 
tion. A good working rule is to take a grouping unit which will 
yield from five to fifteen intervals. This rule may have to be 
broken when the sample is very large (200 or 300, say) or very 
small (less than 25). | s 

(3) Divide the range by the AIntervalsize tenta 
This gives the approximate within one) numbe: 
In Table A-1 the range of 28 divided by five giv 
number of intervals is six. Five is a better choice 
or smaller unit. For example, an interval of three 


gap between the highest 
f forty scores, we find the 


tively chosen. 
Т of intervals. 
©$ 5.6, and the 
than is a larger 
Will spread the 
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data out too thin (into ten intervals), whereas an interval of ten 
crowds all forty scores into three intervals. 

(4) Write the beginning and end of each interval as a score: 
for example, 15-19. Actually a score of 15 represents the interval 
from 14.5 to 15.5—that is, a distance along an ability scale; and 
19 represents the interval 18.5 to 19.5. Hence, the lowest interval 
begins at 14.5 and ends at 19.5; the second interval begins at 19.5 
and ends at 24.5, and so on. Writing score limits instead of actual 
limits saves time and avoids the confusion which often arises 
when one interval ends and the next begins with the same figure. 

(5) Tally each score under its proper interval as shown in 
Table A-1. Then write the sum of the tallies opposite each 
interval under f (frequency) :Sum the f's to give N., 

Note that the midpoint of the topmost interval is 42—that is, 
2.5 from 39.5 and 2.5 from 44.5. The midpoiut: have been: 
entered in the second column. When scores have bez arranged 
into a frequency distribution, all of the f's within a given intezval 
are represented by the midpoint of that interval. 


The Frequency Polygon 


Figure A-1 shows the frequency polygon of the forty scores 
tabulated into a frequency distribution in Table A-1. Two axes, 
a horizontal or X-axis and a vertical or Y-axis, are drawn at right 
angles. Score intervals are laid off at regular distances along the 
X-axis, or baseline, beginning with 15, the lower limit of the 
first interval. The six scores on the lowest interval are represented 
by a point six units up on the F. -axis and just above 17, the mid- 
point of interval 15-19. The nme scores on the next iuterval 
are represented bya point nine units up on У and just above 22, 
midpoint of the interval 20-24. The other f's are drawn in 
in thé same manner. , 

When all of the poir:s are joined by short straight lines, we, 
have the outline of the frequency polygon. To complete the 
figure—that is, to bring it down to the baseline at each end— 
two intervals are added, one (10-14) at the low end and other 


242 Statistical Supplement 


FIGURE A-1 Frequency Distribution of tbe Forty Scores in 


Table 4-1 


- y - 
(Frequencies) 


(Scores) 


(45-49) at the high end. The f on each of these intervals may 
be taken as 0, and hence the frequency polygon reaches the 
X-axis at 12 and 47. < 

In order to provide а symmetricai figure—one which is neither 
too squat nor too thin—units in X and Y must be carefully 
chosen. A good rule is to select units which will make the height 
of the figure about 2/3 of its length. In Figure А-1 the maximum 
f (10) is about 2/3 the baseline length of the polygon. 


The. Histogram 


The frequency distribution of Table A- 
in Figure A-2, this time by a histogram, or column diagram. 
‘The main difference between the frequency polygon and the 
histogram is that in the histogram the f's are represented by 
small rectangles whose height equals the f’s on the intervals. In 
Figure A-2, for example, the height of the first rectangle is 6, 
its width being the length of the interval 14.5 t 195 Fach 


1 is again represented 


f. 
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FIGURE A-2 Histogram of tbe Forty Scores in Table A-1 
zH 


© 


-y- 
(Frequencies) 


л 


15 25 35 45. 


= 
(Scores) 
P 


frequency rectangle begins at the actual lower limit of the inter- 
val and ends at the actual upper limit. The histogram presents 
the same facts as the frequency polygon and there is often little 
to choose between them. When two or more frequency distribu- 
tions are represented on the same axes, however (as for example, 
the scores of two classes or two sections of the same class), the 
frequency polygon is to be preferred to the histogram, because 
the vertical and horizontal lines in a histogram coincide and are 
often difficult to disentangle. 


COMPUTATION OF AVERAGES 


There arè three averages in common use: the mean, the median 
and the mode. 


The Mean (M) - 


We have defined th. M on page 20 as the statistic found by 
dividing the sum of the scores by their number. When Scores 
are put into a frequency distribution, the scores classified within 
any interval lose their identify and are represented by the mid- 
point of that interval. "This necessitates a slightly different pro- 
cedure from that used’ with unorganized scores. : F 
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In Table A-2, section A, the midpoint of each interval is 
multiplied or “weighted” by the frequency which lies opposite 
it in the f column. This gives the fX column and the sum of 
this column (1100 in Table A-2) divided by N (40) gives a 
mean of 27.50. The formula is 


B... 
mt 3 
TABLE A-2 


Computation of the Mean from a Frequency Distribution 
Data«are the forty scores in Table А-1. 


A. LONG METHOD 


Intervals Midpcints 


f fx 

40 — 44 42 3 126 

35 – 39 37 4 148 

30 – 34 32 8 256 

25 – 29 27 10 270 
20 - 24 22 9 198 

15 - 19 17 6 102 


B. ASSUMED MEAN METHOD (SHORT METHOD) 
Intervals Midpoints 

40 — 44 

35 – 39 

30.- 34 


25 – 29 
25 — 24 
15 - 
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where ZfX is the sum of the products f X X and N is the 
number of cases. 

M can always be computed by the "Long Method" just de- 
scribed, but it is generally computed by the Assumed Mean, or 
"Short Method." When N is large, the Short Method reauces 
calculation and saves time. Moreover, the Short Method is man- 
datory when standard deviations and coefficients of correlation 
are later to be computed from the same data. Computation of М 
by the Assumed Mean or Short Method is shown in Table A-2, 
Section B. Steps are as follows: j 

(1) Assume a mean, called the AM, near rhe center of the 
frequency distribution and if possible on the interval having 
the largest f. In our example, the AM is taken at 27, midpoint of 
interval 25-29, and this interval also has the largest f. 

(2) In the column a * lay off deviations from thé АМ of 27 
in units of interval. The midpoint of interval 30-34—that is, 32— 
deviates five scores or one interval from 27; and the midpoint 
of interval 35-39 deviates two intervals from 27, and so on. 
Below the AM, the deviations of the midpoints of the two inter- 
vals—22 and 17—are —1 and —2. The midpoint of the interval 
25-29—that is, 27—is the assumed mean, and 0 is entered in 
the 2^ column opposite this*interval. 

(3) Multiply each a^ by its f and enter the product in the fx’ 
column. The sum of this column is 4—25-21—from which the 
correction (c) is calculated. The forrnula is 

o Ж Sfx’ 
Сут 
and с = 4/40 or .10 in our problem. 

(4) Multiply c, the correction in units of interval, by the 
length of the interval or i, to give ci, the correction in score units. 
In our example, ci = .100x 5 = .50.°° 

(5) Add the correction, ci, to the 4M to get M. In Table A-2, 
aM inion iom Mae atem UY from the AM; tha, = Mdyi — 


E a є 
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section B, the М = 27.00 + .50, or 27.50, thus checking the , 
computation in A above. 


The Median, or Mda 


The median is defined as that point in the distribution below 
which and above which lie 50 per cent of the distribution. The E 
median is also described as the fiftieth percentile (Pso) and the 


TABLE A-3 


Computation of the Median and Q from a 
Frequency Distribution 
Data are tbe forty scores in Table A-1 


Intervals 


f 
40 — 44 3 
"35 - 39 " 
30 - 34 8 
25 – 29 100 25 
nn rs Иер 
6 6 
N =40 
N/2 = 20 N/4 = 10 3N/4 = 30 
20 — 15 
By formula, Median = 24.5 + (A ) 
=270 
30 — 25 
By formula, Qs = 29.5 + (QM 
= 32.63 
4 10 — 
By formula, Qi = 19.5 + 5( $) 
9 
= 21.7. 


32.63 — 
oQ-29-n 
2 
= 5.46 
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Second quartile (Оз). Computation of the median in a fre- 
quency distribution is shown in Table A-3. (The О, or quartile 
deviation, which is found in the same Way as the median, is 
also computed in the table.) Steps are as follows: ` 

(1) Take % of N and count into the distribution from the 
low end until the interval containing the median is reached. In 
Table A-3, N/2 = 20, and counting into the distribution from 
interval 15-19, we locate the median on interval 25-29. The 
two lowest intervals contain 6 + 9 or 15 f's, and it is clear from 
this cumulated f that the twentieth score must fall on interval 
25-29. 

(2) Apply the following fórmula: 


Mdn =1 + j [59 = emp = em P 


m 


in which 
І = lower limit of interval on which Мал lies 
N/2 — V, of the number of scores 
cum fı = sum of scores on intervals below 1 
fn = frequency on the interval containing the Мал 
i — length of the interval 


In our example in Table A-3,.1 = 24.5, lower limit of interval 
containing Mdn; N/2 = 20; cum fı = 15; fm = 10; i = 5. 
Substituting in the formula, we have 


© Mdn = 245 + (2 =) 


~ 


Ш 


= 27.00 : | 
The median can be found by counting into the distribution f: om 
either end, but it is generally easier to Start at the low end. 


The Mode 
P The mode is usually taken as the midpoint of the interval 
| which contains the largest f. In Table А-3, the mode is simply 27, 


the midpoint of the interval 27-29. This “midpoint” 


mode i 
often called the crude mode. Th 3 


e mode may be calculated more 
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accurately, but since it is usually a preliminary statistic it is 
hardly worth while to do so. 


COMPUTATION OF MEASURES OF VARIABILITY 


The means or medians of two distributions are often the same 
or nearly so, but the spread or scatter of the scores around the 
central point is quite different. One class, for example, may 
show the same mean but a much greater range of talent than 
another. Knowing the variability of performance within a class 
may be more useful than knowing its average or typical per- 
formance. > 

There are three measures of variability all of which are used 
in mental testing: the range, the Q and the SD (о). 


Тһе Range 

The range is the gap between the smallest and largest о 
The range is а useful statistic, but is often a rough measure. Jt 
is 1 


east efficient when there are several outstanding scores 
her very large or very small. For example, suppose there 1S 2 
gap of 20 points between 75, the highest score, and 55, the score 
next below it. Then if the lowest score in the set is 25, the single 
outstandingly high score will increase the range from 30 to 50. 
We had occasion to find the range in Constructing the frequency 
distribution ( page 240). 


eit 


The 3, or Quartile Deviation 


Q, the quartile deviation, is defined as one-half the distance 
between the seventy-fifth and twenty-fifth Percentile points in 
the distribution. To find these two Percentiles, we must count 
into the distribution as we do to find the median. In Table A-3, 
for example, we count off 74 of N to get Qs (the third quartile 
or seventy-fifth percentile) and VA of N to reach Qı (the first 
quartile or twenty-fifth percentile). The formula for Qs is 

. [23/4 — cum f, 
Оз + Ста ) 


m 
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and the formula for Qi is 


esl X (Же X j 


in which : 
1 = lower limit of the interval upon which the 
quartile point falls 
i — the interval 
` cum f; = cumulated f’s up to the interval containing 


the quartile wanted 
m = f on the interval which contains the quartile 
In Table A-3, % of N is 30. Counting into the distribution 
from the low end, twenty-five scores take us to 29.5, lower 
limit of 30-34, which is the interval containing Оз. The f. on 
this interval is 8. Substituting in the formula, we have 


30 — 25 
Qs = 29.5 + 5 8 ) 
= 32.63 


To obtain Q; we count off % of N or ten scores as shown 
in Table A-3. Six scores take us to 19.5, lower limit of the 
interval 20-24, the interval which contains Qi. The f on this 
interval is 9. Substituting in the formula, we have 


О, = 19.5 + :(20 = £) 
п = 2172 


From the two quartile points, Qs and Qı, we find Q bj Ў 
stituting in the formula A е 


ОРЕ (Q: i Qı) 
; * (32:63 — 2i: 
and in our example, Q — 22-13 or 5.46. 


`Тһе Standard Deviation, SD or c (sigma) 


The standard deviation, or c, 
puted around the mean; herice į 


е p— he 


35 a measure of variability com- 
t ıs usually calculated from the 
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same frequency distribution as the mean. SD, or ©, is the most 
stable measure of variability within a group and so is regularly 
used in research problems which involve correlation and in- 
ference. The computation of с from a set of ungrouped scores 
was outlined on page 22. Calculation of the SD from a fre- 
quency distribution requires a somewhat different procedure. 
"The method is illustrated in Table A-4 for the same forty scores 
tabulated in Table A-1. Steps are as follows: à 


TABLE A-4 
Computation of the Standard Deviation (o) from c 
га Frequency..Distribution 
Data are tbe forty scores in Table А-1. 


intervals f x. fx’ tx? 

40 – 44 3 3 9 27 

- 35 — 39! 4 2 8 16 
30 – 34 8 1 8 8 

25 – 29 10 0 +25 с 

20 - 24 9 =I =9 9 

15 – 19 6 =2 —12 24 

N = 40 -n 84 


midpoint from the АМ, 
se figures as 1, 2, 3, 9: 
in the x’ соїштй. 

give the entries in the ўз 


(1) Find the deviation (x^) of each 
as was done in Table A-2. Enter the 
—1, —2— that is, in units of interva! 

(2) Multiply each a by its f to 
celumn. 

(3) Multiply each 2’ and its corresponding fx’ entry to give 
the entries in the fx’? column. For example, 2 = 3 times fa’ = 9 
gives 27 as the fx” entry. s 


4 
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(4) Sum the fx’ column to give Sfx”. 

(5) Compute the correction (c) as in Table А-2. Square c 
to get c?. Be sure that c is left in units of interval. 

(6) Find с from the following formula: 


es i jae E c? 
N 
In our example, i = 5, Sfx’? = 84, N = 40 and c? = .01. 
Substituting these values in the formula, we get с = 7.23. It 
will be clear that in computing с we make use of the same 
quantities used in finding the mean; only the Sfx’? is new. 


CORRELATION 


Correlation (page 27) is the correspondence or relationship 
between two sets of test scores. Degree of relationship is ex- 
pressed by a coefficient of correlation (r) along a scale which 
extends from —1.00 to +1.00 through .00. There are several 
methods of computing correlation, of which the product-moment 
method is the most often employed in dealing with test scores. 
Calculation of a product-moment r is illustrated in Table A-5. 
` Table A-5 shows the computation of the correlation between 
test scores in reading and arithmetic achieved by ten children 
in the fifth grade. The sample is much too small to give an ade- 
quate indication of the relationship between these two variables, 
and our table must be taken as a much simplified illustration of 
correlational method. ^ : 

The coefficient of correlation in Table A-5 is .23, revea'ing a 
positive but quite low relationship between the two tests. The 
first test (reading) is designated X, and the second test (arith- 
metic) is Y. Note that, in order to compute the correlation, we 
must first find the deviation of each child's X-score from Mx 
and the deviation of his ¥-score from My. Each deviation from ; 
Mx (53) is entered in the x column, and each deviation frm. 
My (21) is entered in the y column. Each x and y is then squared 
and entered in the 2° and y? columns, and the sums of these two 
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TABLE А-5 
Correlation between Reading and Arithmetic in the 
Fifth Grade 
(N = 10) 
Reading Arithmetic 
Pupils e tv) x ) x? y? xy 
John 60 26 7 5 49 25 35 
Carol 55 24 2 3 4 9 6 
Ann 63 18 10 —3 100 9 —30 
Betty 40 21 -3 0 169 0 0 
Louise 52 17 —1 4 1 16 4 
Tom 61 20 8 i. 64 1 =R 
Bill 43 ES —iü —6 100 36 60 
Joan 56 25 B ied 9 16 12 
Dick 44 23 —9 2 81 4 —18 
Сагі 56 21 3 0 9 0 0 
S 530 210 


Ух? = 586 Xy2—116 xxy —61 
Мұ=53.0 My=21.0 


E 61 
© узве x 116 = 23 


columns are found. In the last column (xy), the x and y devia- 
tions of'each pupil are multipl 


plied with due regard for sign, and 
the sum of the xy column is determined. Finally, the sum of the 
xy column is divided by the 


б 7 Square root of the product of the 
=x? and Sy? to give the coefficient of correlation. The formula is 
r= ху 
Vix. Sy? 


Tue formula for r ma 


y be written in a number ot ways. The 
form selected for use 


will depend on the character of,:he data, 
size of the sample, purpose, of the experimenter, and other con- 
siderations. Whenever N is more than about 50, the correlation 
coefficient should be computed from a diagram (see references, 


Chapter 2). 


APPENDIX B 


PUBLISHERS OF MENTAL TESTS 


Teachers who do much testing should write the publishers 

below for their catalogs. 

Bureau of Publicatións, Teachers College, Columbia University: 
New York 27, New York.’ 

California Test Bureau, 5916 Hollywood Boulevard, Los Angeles : 
28, California. 

Educational Test Bureau, 720 Washington Avenue, S.E., Minne- 
apolis 14, Minnesota. 

Educational Testing Service, Cooperative Test Division, 20 Nas- 
sau Street, Princeton, New Jersey. 

Houghton Mifflin Company, 2 Park Street, Boston 7, Massachu- 
setts. 

Psychological Corporatio: a, 304 East 45th Street, New York 17, . 
New York. 

Public School Publishing Company; 509-513 North East Street, 
Віоот охоп, Illinois. 

Science Reszarch Associates, Inc., 57 West Grand Avenue, Chi- 
cago 10, Illinois. 

C. H. Stoelting and Company, 424 North Homan Aveni Chi- 
cago 24, Illinois. ; 

Stanford University Press, Stanford, California. 

World Book Company,313 Park Hi Avenue, Yonkers 5, New 
York. 
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GLOSSARY 


achievement test A test designed to measure pupil performance ` 


in some school subject. 

age equivalent The chronological age assigned to an obtained 
Score on a test representing the typical (average) age correspond- 
ing to the score. Example: reading age — 8-4. 

agenorm Typical performance on a test expressed in age equiv- 


alents. 
alternate forms of a test Equivalent or parallel forms of a test. 


aptitude test A test designed to measure potential ability; spe- 
cifically, a test to predict future success in a school subject or in 


a vocation. 
attitude test A test designed to measure likes or dislikes in a 


given area. Example: attitude towards war. 
battery А group of tests, often combined into a team, designed 


to measure a variety of abilities or aptitudes. 
biserial г A coefficient of correlation often used to measure the 


discriminative power of an item in analysis. 
central tendency A measure typical of a group of scores; a mean, 


median or mode. 
chronological age (C.A.) Life age expressed in years and mo..:ths. 


Thus, 10-4 means 10 years and 4 months 
completion items Test questions in which the, examinee must ll 
in blank spaces in a statement or sentence in order to complete 


correlation The tendency for one test to be related, (or unre- 


lated) to another test. 
criterion Any measure of performance with which a test is 


compared in determininz validity. 
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deviation IQ A standard score found by converting raw scores 
into a distribution with a mean = 100 and ao of 15 or 16 points 
diagnostic tests Tests designed to reveal pupils’ strengths and 
weaknesses in school subjects. 

disciiminating power A test item which separates good from 
poor students has discriminating power. 

distractor An option in a multiple-choice test that is incorrect. 
essay items Test items calling for a relatively free response. 
evaluation Appraisal of a pupil's performance; may include in- 
school and out-of-school behaviors. 

frequency distribution Ап arrangement of test scores into groups 
in order of size. 

grade equivalent The grade score assigned to a given obtained 
score on a test. Example: A score of 42 on an achievement test 


may have 2 grade equivalent of 6.5 (halfway through the sixth ` 


grade). 

graphic rating scale A rating device in which possession of a 
given degree of some trait is indicated by a check along a line. 
group test A test that may be administered to all members of а 
group or class at the same time. 

individual test A test administered to only one person at a time. 
IQ (intelligence quotient) Originally, the ratio of mental age t? 
chronological age when mental age is obtained from an Age 
Scale. Often used loosely to mean any set of scores with a mean 
of 100. See deviation IQ. 
intelligence tests Tests designed to measure intelligence, which 
may be defined as mental alertness or ability to do well in school. 
inventory А test or checklist of a person's personal characteris- 
tics, attitudes, or interests. 

item А single question on a test. 

item analysis The process of determining the difficulty and 
validity of test items through statistical analysis, 

matching items Test items in which the members of one list are 
to be matched against the members of a second list. 
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mean The arithmetic average of a set of test scores. 


» median The point that ‘divides a frequency distribution of 


scores into two equal parts. 

mental age (MA) The age for which an obtained score on an 
intelligence test is average or typical. i 
mode The score which occurs most often in a distribution. 
multiple-choice items Test items which call for the selection of 
a correct answer from among several options. 

normal probability curve A theoretical distribution curve which 
many distributions of test scores approximate. 

norms Average performances for various groups—expressed as 
age or grade equivalents for school children, as percentiles, and 
in other ways. P 
objective test А test answered by checking or circling a number 
or letter. Example: True-false test items. 

options Responses from among which an examinee must make 
a selection. 

percentile rank (PR) The equivalent to an obtained score on a 
scale of 100 points. Example: If a score of 86 has a percentile 
rank (PR) of 63, we know that 63 of the group scored below 86. 
personality test А test (often an inventory) designed to assess 
an individual's personal and social behaviors. oy 
power test A test designed to measure level of performance 
rather than speed. 

profile A graphic device for representing an examinee’s scores 
on several tests. , 

projective tests Devices for studying personality through the use 
of ink blots, pictures, designs. ^ 
quartile deviation (Q) A measure of variability. Q equals one- 
half of the range of the middle 50 per cerit of scores. 
questiunnaire A systematic inventory of questions covering per- 
sonality traits, attitudes, or interests. : 

readiness test A measure of a child's readiness or maturity level. 


Often used in reading. 
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reliability Consistency of test scores. 2 

reliability coefficient ^ Correlation coefficient giving the self-corre- 
lation of a test. | 
skewness "The extent to which a distribution of scores is oif 
center :or biased. - 
sociometry Measurement of interpersonal relations within a class 
or other group. = 
split-half reliability Reliability coefficient found by splitting a 
test into halves. The two parts of the test usually consist of odd- 
and even-numbered items. 

standard deviation (SD or o) A measure of variability. 

standard score A converted or derived score found by express- 
ing an obtained score as being so far above or below the mean 
in SD units, j 

standardized tests Printed tests for which there are norms on 
defined groups. Directions are carefully prescribed. А 
test-retest reliability The correlation between scores made on 
the same test administered On two occasions. 

T-score A normalized score. 

true-false items 
true or false, 
validity The degree to which a test measures what it purports 
to measure. There are several sort of validity. 

z-score An obtained score expressed as a deviation from the 
test mean in terms of o, When z-scores are converted into a 


frequency distribution with an assigned mean and c, they are 
called standard scores, 


"Test items which the examinee is to mark as 
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SUBJECT INDEX 


Ability, meaning of, 46 

Achievement tests, 106; definition of, 
102; diagnostic uses of, 116; survey, 
106; teacher-made, 210; value of, 115 

Adjustment inventories, 164 

Age norms, 41 

Age scale, 32; value of, 33 

American Council on Education Psy- 
cholegical Examination (ACE), 91 

Aptitudes, meaning of, 130 

Aptitude tests, art, 151; batteries of, 
133} case studies of the use of, 226; 
clerical, 137; how to judge, 154; in- 
terpreting scores in, 155; mechanical, 
133; music, 151; use in professional 
schools, 147 

Army Alpha test, 8, 81 

Army Beta test, 7 

Army General 
(AGCT), 8, 81 

Art aptitude tests, 151 

Arthur Point Scale, 
schools, 75 

Ascendance-Submission ReactionStudy 
(Allport), 171 


z.ttitudes, 171; questionnaires in the 


study #1, 172 


Bell Adjustment Inventory, 169. 

Bennett Mechanical Comprehension 
Test, 135 e 

Binet, Alfred, 6; characteristics of his 
tests, 6-7 T" 

Biserial r, in item analysis, 215-216 - 


California Achievement 
Cha.a.teristics of, 113 
` California Arithmetic Test, 212 
California Test of Mental Maturity, 
84; descripuon of, 85-87 
California Test of Personality, 166-167 


Classification Test 


73; use of, in 


Tests, 111 


Central tendency, meaning of, 19 

Clerical aptitude tests, 137 е 

Columbia Research Bureau Spanish 
Test, 211 

Combining test scores, 34-37 4 

€ompletion-test items, 202; illustrations 
«f, 203-204 

Content analysis, 31, 126, 213 

Cooperative French Test, 123 . 

Cooperative General Achievement 
Tests, 113-114 s 

Cooperative Mathematics Test, 122 

Cooperative Science Test, 125 

Correction for guessing, 191; when to 
use, 194 

Correlation, meaning of, 27-28 — 

Correlation coefficient, computation of, 
251-252 

Criteria, in validity, 154 

Diagnostic tests, clinical, 57-59, 67-68; 
differential, 140-144; educational, 52- 
56, 68, 70-72, 75-77, 93-95 ^ 
iagnostic Tests of Achievement in 
Music, 152-153 

Differential Aptitude ‘Tests (DAT), 
140-144 

Educational achievement tests, 102- 
103; and intelligence tests, 103; com- 
pared wità school examinations, 104- 
106; in school subjects, 118; how 
used in Schools, 115-118; :vnat to 
look fûr in, 125-128 

Educational age (EA), 127 

Essay tests, described, 204-205; how to 
ЭЙР dar 205-206; Scoring in, 206- 


Evaluation and Adjustment Series, 122 


Frequency distribution, 14-15; rules for 
constructing, 239-241 
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Subject Index 
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TN Frequency polygon, 15; how to cont Matching items, 199; illustrations of, 


~~ struct, 241-242 


‘Galton, Francis, role in testing move- 

^ ment, 5-6 

General Clerical Test, 139 ^" 

Gordon's Personal Profile and Personal 
Inventory, 168-169 

Grade norms, 41 

Group tests, of intelligence, 80-82; in 
guidance, 93-95; norms in, 98; relia- 
bility of, 97; scaling in, 97; use in 
schools, 92-96 

Guidance, educational and vocational, 
93-95, 115-117, 229-233 


Halo effect in ratings, 162 
Histogram, 16-17; how to construct, 
242-243 


«^ Individual differences, importance of, 


3, 12 

Intelligence, meaning of, 45; levels of, 
46-47 е 

Intelligence quotient (JQ), Stanford- 
Binet, 51-52; constancy of, 61-63; dis- 
tribution of, 53-54; precautions in 
interpreting, 60-61; stability of, 56- 
57 


Intelligence quotient (JQ), Wechsler- 
Bellevue, 65-67; in diagnosis, 67-68; 
range of, 67 . 

Intelligence tests, factors in the choice 


of, 96-100; group, 80-81; individual, ^ 


44-45; performance, 72-75 

Interest inventories, 174-180 

Iowa Silent Reading Tests, 121-122 

IQ (intelligences quotient), 33; as 
standard score, 39; as ratio, 52. See 
Intelligence Quotient 

Item analysis, 214-221; short method 
of, 219-221 

Ttem (test), difficulty of, 213; selection 
of, 212-213; validity of, 214 


Kuder Preference Record, 177-579 
Kuhlmann-Anderson Intelligence 
"Tests, 88-89 


Law School Admission Test, 149 


MacQuarrie Test of Mechanical Abil- 
ity, 134-135 


e 


200-202 

Mean, 20; in frequency distribution, 
243-246 

Mechanical aptitude tests, 133-137 


Median, 20; in frequency distribetion, 
^ 


246-247 
Medical CollegefA dmission Test, 148- 
149 
Meier Art Judgment Test, 153-154 
Mental age (MA), 32-33 à 
Mental tests, classification of, 3-5; com- 
pared with physical, 2-3; history of, 
5-12; uses of, in schools, 12 
Metropolitan Achievement "Tests, 109- 
111 
Metropolitan Readiness Tests, 118-120 
" Minnesota Clerical Test, 1375139 
Minnesota Paper Formboard Test, 144- 
145 
Mode, 20-21, 247-248 Ы 
Multiple-choice items, 193-194; illus- 
trations of, 195-197 
Multiple response items, 198-199 
Murphy-Durrell Diagnostic Reading 
Readiness Test, 145-146 
Musical aptitude tests, 151-153 


National Teacher Examination, 150 
Nelson-Denny Reading Test, 212 К 
Normal distribution, 17; uses of, in 
testing, 17-19 
Normal probability curve. 17; areas 
under, 23 
Norms, 40; age, 33; percentile, 
standard scores as, 36-38 
Objectives, educational, 105-106 
Objective tests, 80, 105; compared with 
essay examinations, 185-185,. item 
types in, 185 & 
Occupational Interest Inventory, 175- 
177 “ 
Orleans. Algebra Prognosis Test, 146- 
m? s 
Otis Quick-Segring 
Tests, 87-88 
Percentile rank, 25-27; advanta 
33-36; limitations of, 36; по; 
terms of, 35 
"Percentile scale, 33-36 
Performance tests, 72-75 


о 
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Mental Ability 
5 of, 
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"in. 


© 


“ 


/ 


{ 


Personality, meaning of, 157-158; in- 
ventories in the mcasurement of, 
164; rating scales in the measure- 
ment of, 158-163; sociometric tech- 
niques in, 233-237 : 

Personality inventories, 164-180; sum- 

^ mary on the use ef, 180-182 

Pintner's Aspects o \ Personality, 167- 
168 

Pintner-Cunningham Primary Tests 
82-84 

Pre-Engineerin Ability Tests; 149-150 

Profiles, use of, in comparing test re- 
sults, 35, 85, 143 К 

Projective tests, meaning of, 158 


Quartile, meaning of, 24-25 


Quartile deviation (О), calculation Of. 
248-245 


Questionnaires, 164 


262 


7, coefficient of correlation, 27-28; cal- 
culation of, 251-252 
ange of scores, 14, 248 

Rating scales, 158-160; factors affect- 
ing, 160-162; improvement in, 162- 
163; summ: on, 163 

Reliability ofa test, coefficient of, 27- 
29; parallel forms in, 29; Split-half 
шише їп, 223-224; test-retest in, 


Seashore Measures of Musical Talent 
151-152 ) 


Selection. of tests, 
125-128, 154-158 
quential Tests of 
Tess, 114-115 

igma scores, mea 


Educational Prog- 


ning of, 36. 


Sociometric techniques, 233-237 
» 22; calculation of, 
frequency 


Standard deviation 
in simple Series, 22; in a 
distribution, 249-25] 

f 


"ned 


Subject Index 


W 
factors in, 96-100, * 


e 


(Standard error, of a score, 29-30 ; 

Standard scores, 36; computation c: 
36-38, normalized or. T-scores, 40 

Standard'zed tests, 210-211 aa 

Stanford..\chievement Tests, 106-109 

Stanford-Binet Intelligence Scale, 47 
52; reliability of, 56-57; scoring ir 
51; uses of, in the schools, 52-60 
validity of, 61-63 

Strong Vocational Interest Biank, 179- 
180 

Study of Values 
Lindzey), 172-173 


Teacher-made tests, 219 ff, 
Terman-McNemar Tes of Ment 


(Allport- Vernon 


True-False items, 189-190; illustratio 
of, 192-193 

T-score, 40 

Turse Shorthand Aptitude Test, 147 

Validity, of a test, 
214 

Variability, in scores, 21 

Verbai ability, and Performance abil- 
ity, 64-65, 68, 75-77 


о 


30-31; of test items, 


Wechsler Adult Intelligence Scale, 63 
echsler Intelligence Scale for Chil- 
9 


dren, 68-69; compared with Stan- 
ford-Binet, 70; MA in, 72; range and 
stability of IQ's in, 71 


Z-Score, 36 
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